Open jeremiedb opened 1 year ago
Here's a MWE:
using Images
using StatsBase: sample, shuffle
using DataAugmentation
using Flux
using TestImages
import Base: length, getindex
import Flux.MLUtils: getobs, getobs!
const im_size = (224, 224)
imgs = ["chelsea", "coffee"]
struct ImageContainer{T<:Vector}
img::T
end
length(data::ImageContainer) = length(data.img)
tfm_train = DataAugmentation.compose(ScaleKeepAspect(im_size))
function getobs(data::ImageContainer, idx::Int)
path = data.img[idx]
# img = Images.load(path)
img = testimage(path)
img = apply(tfm_train, Image(img))
img = itemdata(img)
# img = permutedims(channelview(RGB.(itemdata(img))), (3, 2, 1))
return img
end
data = ImageContainer(imgs)
deval1 = Flux.DataLoader(data, batchsize=2, collate = true, partial = false)
Incuding the line img = itemdata(img)
result in the initialization of deval1
to crash. If line is commented, deval1
creation will work fine.
Although this may look at first glance as an Images' related issue, I think it is more tied to DataLoader
since calling batch = getobs(data, 1);
works fine and returns the image. So the getobs
function can be evaluated successfully.
Also, if collate
is set to false
, it will also works fine: deval2 = Flux.DataLoader(data, batchsize=2, collate = false, partial = false)
@lorenzoh, would you have a take on this one?
(CatDogPanda) pkg> st
Project CatDogPanda v0.1.0
Status `C:\github\CatDogPanda\Project.toml`
[336ed68f] CSV v0.10.9
[052768ef] CUDA v3.12.1
[88a5189c] DataAugmentation v0.2.11
[587475ba] Flux v0.13.11
[916415d5] Images v0.25.2
[2913bbd2] StatsBase v0.33.21
[5e47fb64] TestImages v1.7.1
These axes represent images the same size, with offset indices:
julia> length.((62:285, 18:241, 1:3))
(224, 224, 3)
julia> length.((38:261, 18:241, 1:3))
(224, 224, 3)
I presume the previous version ignored offsets & made an Array
, like the cat
functions do at present.
Julia 1.9's stack
instead takes offsets seriously, and propagates them to the output, hence demands equality. Something like stack(OffsetArrays.no_offset_view, images)
would avoid this.
Flux won't work at all on arrays with offset indices. So there's some chance MLUtils should always remove them?
I think you're having the right diagnosis. I just tried:
path = imgs[1]
_img = testimage(path)
_img = apply(tfm_train, Image(_img))
size(_img.data)
img = channelview(float32.(_img.data))
julia> typeof(img)
Base.ReinterpretArray{Float32, 3, RGB{Float32}, OffsetArrays.OffsetMatrix{RGB{Float32}, Matrix{RGB{Float32}}}, true}
Then, on this reinterpreted OffsetMatrix
:
julia> Array(img)
ERROR: DimensionMismatch: axes must agree, got (Base.OneTo(3), Base.OneTo(224), Base.OneTo(224)) and (Base.OneTo(3), OffsetArrays.IdOffsetRange(values=2:225, indices=2:225), OffsetArrays.IdOffsetRange(values=59:282, indices=59:282))
Hover, collect(img)
works fine.
The above is with Julia 1.8.4. The same behavior is also observed on previous version of Setfield (pre v1) which I thought could have been in cause.
In short, using collect
seems to be the proper way to get array through such image dataloader. For example:
tfm_train = DataAugmentation.compose(ScaleKeepAspect(im_size), CenterCrop(im_size))
function getobs(data::ImageContainer, idx::Int)
path = data.img[idx]
_img = testimage(path)
_img = apply(tfm_train, Image(_img))
img = collect(channelview(float32.(itemdata(_img))))
return img
end
A caveat from the itemdata
however is that it results in more allocation, 2.1Mb instead of 1.615 MiB on MLUtils v0.3.1 where collect
could be omitted. Do you see a way to avoid this?
Update from v0.3.1 to v0.4.0 resulted in a failure on a data loader:
Strangely, the message about the axes sizes seems legitimate. The error message appears to come from https://github.com/JuliaLang/Compat.jl/blob/295c146528063385a0d89bc2be12a7f534052d82/src/Compat.jl#L610, which itself is called by
_dim_stack
: https://github.com/JuliaLang/julia/blob/de73c26fbff61d07a38c9653525b530a56630831/base/abstractarray.jl#L2847The error can be reproduced with following script: https://github.com/jeremiedb/ImageNetTrain.jl/blob/main/experiments/loaders/test-loader-min.jl Although it assumes imagenet data is available. I'll provide a more minimal reproducible example in case the above details don't already hint to the issue that came with v0.4.
It may tied to https://github.com/JuliaML/MLUtils.jl/issues/119, though I wasn't clear whether or how the Loader for images would no longer be properly defined for MLUtils.