In the following MWE I successively create an out-of-memory data source of 20 MNIST images using FileDataset. I can the wrap the source as MLUtils.DataLoader with the default parallel=false option and collect the result. However, if I specify parallel=true then the collect hangs.
Pkg.activate("data", shared=true)
import MLDatasets: MNIST
using MLDatasets
using ScientificTypes
using MLUtils
using FileIO
ENV["DATADEPS_ALWAYS_ACCEPT"] = true
images, labels = MNIST.(split=:train)[:];
N = 20
images = coerce(images, GrayImage)[1:N];
# save some MNIST images as tiff files:
const dir = tempname()
for i in eachindex(images)
filename = joinpath(dir, "$i.tiff")
FileIO.save(filename, images[i])
end
# create out-of-memory image source:
X = MLDatasets.FileDataset(dir)
sequential = DataLoader(X, batchsize=2, collate=true)
collect(sequential) # executes as expected
parallel = DataLoader(X, batchsize=2, collate=true, parallel=true);
collect(parallel); # hangs
In the following MWE I successively create an out-of-memory data source of 20 MNIST images using
FileDataset
. I can the wrap the source asMLUtils.DataLoader
with the defaultparallel=false
option andcollect
the result. However, if I specifyparallel=true
then thecollect
hangs.Here's my setup: