JuliaML / MLUtils.jl

Utilities and abstractions for Machine Learning tasks
MIT License
109 stars 22 forks source link

Lazy operations should be opt-in #74

Open CarloLucibello opened 2 years ago

CarloLucibello commented 2 years ago

Operation like splitobs, shuffleobs and many more return ObsViews that one has to call getobs on in order to materialize. I think this is unexpected for users coming from scikit-learn and mildly annoying in most scenarios. As a default, operations on materialized objects should return materialized objects (e.g. arrays and dataframes). Users will be able to opt-in on the "lazy" by wrapping data in a ObsView. Operations on ObsView will produce other ObsView that can be materialized only at the end of the pipeline.

darsnack commented 2 years ago

Makes a lot of sense to me. Maybe we should rename ObsView to LazyView to indicate that it is both a view (subset) of the observations as well as being lazy.