JuliaML / MLDatasets.jl

Utility package for accessing common Machine Learning datasets in Julia
https://juliaml.github.io/MLDatasets.jl/stable
MIT License
228 stars 47 forks source link

Add MultiMNIST to MLDatasets? #208

Closed manuelbb-upb closed 1 year ago

manuelbb-upb commented 1 year ago

To do some experiments along the lines of [1], I reimplemented their data generation routine in Julia. For MultiMNIST, we want to have two digits in one 28x28 image and 2D labels. That's why the MNIST type needs to be changed to

struct MultiMNIST <: SupervisedDataset
    metadata::Dict{String, Any}
    split::Symbol
    features::Array{<:Any, 3}
    targets::Matrix{Int}  # `MNIST` has Vector{Int}
end

Here is a constructor to create a MultiMNIST data set from an MNIST object. For that, I use ImageTransformations and Interpolations.

Does it seem like a good idea to try integrating something like that into MLDatasets? It would add those two dependencies...

[1] O. Sener and V. Koltun, “Multi-Task Learning as Multi-Objective Optimization,” arXiv:1810.04650 [cs, stat], Jan. 2019, Accessed: Jan. 24, 2022. [Online]. Available: http://arxiv.org/abs/1810.04650

manuelbb-upb commented 1 year ago

This will likely better be its own package