JuliaML / MLUtils.jl

Utilities and abstractions for Machine Learning tasks
MIT License
107 stars 20 forks source link

define mapobs behavior for vector of indexes #147

Closed CarloLucibello closed 1 year ago

CarloLucibello commented 1 year ago

Previous mapobs behavior was either not meaningful or causing error, e.g.

julia> mdata = mapobs(x -> sum(x.a) + sum(x.b), (a = 1:10, b = 11:20))
mapobs(#112, NamedTuple{(:a, :b), Tuple{UnitRange{Int64}, UnitRange{Int64}}})

julia> mdata[1] # OK with integer index
12

julia> mdata[1:2] # ERROR with vector index
ERROR: ArgumentError: broadcasting over dictionaries and `NamedTuple`s is reserved
Stacktrace:
...

This PR settles for a sensible behavior but other choices are possible, for instance

getindex(md::MappedDataset, idx::Vector) = [md.f(getobs(md.data, i)) for i in idx]

but that seems strictly less flexible than what this PR does.

Edit Playing a bit with this to create transformed dataset I realized I need more customizability, hence the batched argument.

batched = :never is a behavior similar to pytorch transforms, while batched = :always is how HuggingFace dataset's transforms are applied.

codecov-commenter commented 1 year ago

Codecov Report

Merging #147 (17f1075) into main (ff2fcc1) will increase coverage by 0.11%. The diff coverage is 57.14%.

:mega: This organization is not using Codecov’s GitHub App Integration. We recommend you install it so Codecov can continue to function properly for your repositories. Learn more

@@            Coverage Diff             @@
##             main     #147      +/-   ##
==========================================
+ Coverage   88.28%   88.40%   +0.11%     
==========================================
  Files          15       13       -2     
  Lines         589      595       +6     
==========================================
+ Hits          520      526       +6     
  Misses         69       69              
Impacted Files Coverage Δ
src/obstransform.jl 82.69% <57.14%> (-1.40%) :arrow_down:
src/Datasets/Datasets.jl
src/MLUtils.jl

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.