JuliaML / MLUtils.jl

Utilities and abstractions for Machine Learning tasks
MIT License
109 stars 22 forks source link

kfold time series #58

Open luboshanus opened 2 years ago

luboshanus commented 2 years ago

Hi,

have you thought about porting some Time Series utility functions? Such as kfold for time series?

https://alan-turing-institute.github.io/MLJ.jl/stable/evaluating_model_performance/#MLJBase.TimeSeriesCV

julia> MLJBase.train_test_pairs(TimeSeriesCV(nfolds=3), 1:10)
3-element Vector{Tuple{UnitRange{Int64}, UnitRange{Int64}}}:
 (1:4, 5:6)
 (1:6, 7:8)
 (1:8, 9:10)

Thanks.

juliohm commented 2 years ago

I am not the maintainer of MLUtils.jl but I believe that anything related to specific domains such as time series, spatial data should be developed in separate projects.

For example, in geospatial ML there are a couple of methods available in GeoStats.jl: https://juliaearth.github.io/GeoStats.jl/stable/validation.html

You could even use these with time series data. Alternatively, you can propose specific validation methods in TimeSeries.jl or any other package that is devoted to the analysis of time series objects.

darsnack commented 2 years ago

MLDataPattern.jl had time series functions, but they were tricky to work with and didn't compose as well with the rest of the package. In general, time dimensions are hard to get right. So, a quick port of the MLDataPattern.jl functions is probably not what we want here. Indeed, what we finally land on might be more appropriate in a separate package as Julio suggested.

So, the answer is yes, but whatever is proposed will have to be carefully considered for how we want to work with temporal data in general.

luboshanus commented 1 year ago

What about a simple kfolds like this: https://stats.stackexchange.com/a/14109 Thanks :)