v0.4.0 introduced error for DataLoader which collates images

jeremiedb commented 1 year ago

Update from v0.3.1 to v0.4.0 resulted in a failure on a data loader:

ERROR: LoadError: DimensionMismatch: stack expects uniform slices, got axes(x) == (62:285, 18:241, 1:3) while first had (38:261, 18:241, 1:3)
Stacktrace:
  [1] try_yieldto(undo::typeof(Base.ensure_rescheduled))
    @ Base ./task.jl:871
  [2] wait()
    @ Base ./task.jl:931
  [3] wait(c::Base.GenericCondition{ReentrantLock})
    @ Base ./condition.jl:124
  [4] take_buffered(c::Channel{Any})
    @ Base ./channels.jl:416
  [5] take!(c::Channel{Any})
    @ Base ./channels.jl:410
  [6] iterate(#unused#::MLUtils.Loader, state::MLUtils.LoaderState)
    @ MLUtils ~/.julia/packages/MLUtils/KcBtS/src/parallel.jl:140
  [7] iterate(loader::MLUtils.Loader)
    @ MLUtils ~/.julia/packages/MLUtils/KcBtS/src/parallel.jl:132
  [8] iterate(e::DataLoader{ValContainer{Vector{String}, Vector{String}}, Random._GLOBAL_RNG, Val{true}})
    @ MLUtils ~/.julia/packages/MLUtils/KcBtS/src/eachobs.jl:173
  [9] eval_f(m::ResNet, data::DataLoader{ValContainer{Vector{String}, Vector{String}}, Random._GLOBAL_RNG, Val{true}})
    @ Main ~/github/ImageNetTrain.jl/resnet-optim.jl:161

Strangely, the message about the axes sizes seems legitimate. The error message appears to come from https://github.com/JuliaLang/Compat.jl/blob/295c146528063385a0d89bc2be12a7f534052d82/src/Compat.jl#L610, which itself is called by _dim_stack: https://github.com/JuliaLang/julia/blob/de73c26fbff61d07a38c9653525b530a56630831/base/abstractarray.jl#L2847

The error can be reproduced with following script: https://github.com/jeremiedb/ImageNetTrain.jl/blob/main/experiments/loaders/test-loader-min.jl Although it assumes imagenet data is available. I'll provide a more minimal reproducible example in case the above details don't already hint to the issue that came with v0.4.

It may tied to https://github.com/JuliaML/MLUtils.jl/issues/119, though I wasn't clear whether or how the Loader for images would no longer be properly defined for MLUtils.

jeremiedb commented 1 year ago

Here's a MWE:

using Images
using StatsBase: sample, shuffle
using DataAugmentation
using Flux
using TestImages

import Base: length, getindex
import Flux.MLUtils: getobs, getobs!

const im_size = (224, 224)

imgs = ["chelsea", "coffee"]

struct ImageContainer{T<:Vector}
    img::T
end

length(data::ImageContainer) = length(data.img)
tfm_train = DataAugmentation.compose(ScaleKeepAspect(im_size))

function getobs(data::ImageContainer, idx::Int)
    path = data.img[idx]
    # img = Images.load(path)
    img = testimage(path)
    img = apply(tfm_train, Image(img))
    img = itemdata(img)
    # img = permutedims(channelview(RGB.(itemdata(img))), (3, 2, 1))
    return img
end

data = ImageContainer(imgs)
deval1 = Flux.DataLoader(data, batchsize=2, collate = true, partial = false)

Incuding the line img = itemdata(img) result in the initialization of deval1 to crash. If line is commented, deval1 creation will work fine.

Although this may look at first glance as an Images' related issue, I think it is more tied to DataLoader since calling batch = getobs(data, 1); works fine and returns the image. So the getobs function can be evaluated successfully. Also, if collate is set to false, it will also works fine: deval2 = Flux.DataLoader(data, batchsize=2, collate = false, partial = false)

@lorenzoh, would you have a take on this one?

jeremiedb commented 1 year ago

(CatDogPanda) pkg> st
Project CatDogPanda v0.1.0
Status `C:\github\CatDogPanda\Project.toml`
  [336ed68f] CSV v0.10.9
  [052768ef] CUDA v3.12.1
  [88a5189c] DataAugmentation v0.2.11
  [587475ba] Flux v0.13.11
  [916415d5] Images v0.25.2
  [2913bbd2] StatsBase v0.33.21
  [5e47fb64] TestImages v1.7.1

mcabbott commented 1 year ago

These axes represent images the same size, with offset indices:

julia> length.((62:285, 18:241, 1:3))
(224, 224, 3)

julia> length.((38:261, 18:241, 1:3))
(224, 224, 3)

I presume the previous version ignored offsets & made an Array, like the cat functions do at present.

Julia 1.9's stack instead takes offsets seriously, and propagates them to the output, hence demands equality. Something like stack(OffsetArrays.no_offset_view, images) would avoid this.

Flux won't work at all on arrays with offset indices. So there's some chance MLUtils should always remove them?

jeremiedb commented 1 year ago

I think you're having the right diagnosis. I just tried:

path = imgs[1]
_img = testimage(path)
_img = apply(tfm_train, Image(_img))
size(_img.data)
img = channelview(float32.(_img.data))
julia> typeof(img)
Base.ReinterpretArray{Float32, 3, RGB{Float32}, OffsetArrays.OffsetMatrix{RGB{Float32}, Matrix{RGB{Float32}}}, true}

Then, on this reinterpreted OffsetMatrix:

julia> Array(img)
ERROR: DimensionMismatch: axes must agree, got (Base.OneTo(3), Base.OneTo(224), Base.OneTo(224)) and (Base.OneTo(3), OffsetArrays.IdOffsetRange(values=2:225, indices=2:225), OffsetArrays.IdOffsetRange(values=59:282, indices=59:282))

Hover, collect(img) works fine.

The above is with Julia 1.8.4. The same behavior is also observed on previous version of Setfield (pre v1) which I thought could have been in cause.

In short, using collect seems to be the proper way to get array through such image dataloader. For example:

tfm_train = DataAugmentation.compose(ScaleKeepAspect(im_size), CenterCrop(im_size))

function getobs(data::ImageContainer, idx::Int)
    path = data.img[idx]
    _img = testimage(path)
    _img = apply(tfm_train, Image(_img))
    img = collect(channelview(float32.(itemdata(_img))))
    return img
end

A caveat from the itemdata however is that it results in more allocation, 2.1Mb instead of 1.615 MiB on MLUtils v0.3.1 where collect could be omitted. Do you see a way to avoid this?

JuliaML / MLUtils.jl

v0.4.0 introduced error for DataLoader which collates images #139