Nixtla / neuralforecast

Scalable and user friendly neural :brain: forecasting algorithms.
https://nixtlaverse.nixtla.io/neuralforecast
Apache License 2.0
2.69k stars 312 forks source link

Issue-409 Add support for datasets that can't fit in memory #1049

Open jasminerienecker opened 4 days ago

jasminerienecker commented 4 days ago

As described in this issue: https://github.com/Nixtla/neuralforecast/issues/409

We assume the dataset is split across multiple parquet files - each parquet file corresponds to a single timeseries which is represented as a pandas dataframe. This PR creates a new Dataset class where the getitem method reads the parquet file corresponding to that index, and the from_data_directory() method replicates the from_df() method.

I have added a test to end of core.ipynb that checks the forecasts using this distributed dataset are the same as when the dataset is directly passed in as a pandas dataframe.

review-notebook-app[bot] commented 4 days ago

Check out this pull request on  ReviewNB

See visual diffs & provide feedback on Jupyter Notebooks.


Powered by ReviewNB

CLAassistant commented 4 days ago

CLA assistant check
All committers have signed the CLA.

jmoralez commented 1 day ago

Thanks a lot for your contribution @jasminerienecker, I left some comments.

jmoralez commented 1 day ago

Thanks a lot for working through the changes! I left some more enhancement ideas

jmoralez commented 15 hours ago

Thanks a lot @jasminerienecker! I think after these changes it'll be ready