Nixtla / statsforecast

Lightning ⚡️ fast forecasting with statistical and econometric models.
https://nixtlaverse.nixtla.io/statsforecast
Apache License 2.0
3.99k stars 285 forks source link

Statsforecast tries to grab 360gb memory; crashes notebook #562

Open scottee opened 1 year ago

scottee commented 1 year ago

I'm using the latest statsforecast and trying to run a forecast with 2 models and a dataset of 90K rows. The lib tries to grab 360gb of memory and crashes my notebook. Happened several times in a row with me trying various options with fewer models and fewer rows. Why is it trying to grab so much memory, and how can I avoid that?

scottee commented 1 year ago

One other config point... I had n_jobs=-1. I reduced n_jobs=1 and so far it hasn't grabbed such crazy amounts of memory.

akmalsoliev commented 1 year ago

One other config point... I had n_jobs=-1. I reduced n_jobs=1 and so far it hasn't grabbed such crazy amounts of memory.

Could you provide MRE?

scottee commented 1 year ago

I can't give you the example that caused the problem because of data. I haven't been able to reproduce it on toy examples. It always happens on the example with 90K rows and n_jobs=-1. Know of a large public dataset we both can get access to?

akmalsoliev commented 1 year ago

I can't give you the example that caused the problem because of data. I haven't been able to reproduce it on toy examples. It always happens on the example with 90K rows and n_jobs=-1. Know of a large public dataset we both can get access to?

Do you experience the same issue when generating a dummy dataset with from statsforecast.util import generate_series?

Please provide the code of the Statsforecast and set parameters, additionally the list of models and their parameters.

scottee commented 1 year ago

OS is MacOS Ventura 13.3.1 Python is 3.9.12 Statsforecast is the latest version, but I don't know the number as my jupyter env is set up differently right now.

As for generate_series(), I've not used that before, but I can take a look. (Background: I inherited a notebook that encountered this mem problem, so I don't know much about statsforecast.)

    models=[
        sfm.AutoARIMA(season_length=12, alias='ARIMA'),
        sfm.AutoARIMA(season_length=12, allowdrift=True, alias='ARIMA2'),
        # Orig prob happened with all models, but still happened with just the two above.
        #sfm.AutoCES(season_length=12, alias='CES'),
        #sfm.SeasonalNaive(season_length=12, alias='SN'),
        #sfm.SeasonalWindowAverage(season_length=12, window_size=3, alias='SWA'),
        #sfm.RandomWalkWithDrift(alias='RWD'),
        #sfm.HistoricAverage(alias='HA'),
    ],

    fcster = StatsForecast(
        models=models,
        freq='M',
        n_jobs=-1,  # With 1 or 4, mem problem doesn't happen.
        fallback_model=sfm.HistoricAverage(alias='HA'),
    )

    fcst_df = fcster.forecast(
        df=train_df,
        h=17,
        fitted=True,
    )
akmalsoliev commented 1 year ago

OS is MacOS Ventura 13.3.1 Python is 3.9.12 Statsforecast is the latest version, but I don't know the number as my jupyter env is set up differently right now.

As for generate_series(), I've not used that before, but I can take a look. (Background: I inherited a notebook that encountered this mem problem, so I don't know much about statsforecast.)

Ah knew that it was macOS, yeah sadly there is no solution to this, numba currently does not support multithreading on apple silicon. The only work around is spinning up a docker container and running your Jupyter Notebook from there.

scottee commented 1 year ago

Doh!! MacOS is where I'm running the browser to jupyter. Jupyter server is running on Linux:

Linux xxx 5.4.219-126.411.amzn2.x86_64 #1 SMP Wed Nov 2 17:44:17 UTC 2022 x86_64 x86_64 x86_64 GNU/Linux

All the other values were from jupyter server machine.

akmalsoliev commented 1 year ago

Doh!! MacOS is where I'm running the browser to jupyter. Jupyter server is running on Linux:

Linux xxx 5.4.219-126.411.amzn2.x86_64 #1 SMP Wed Nov 2 17:44:17 UTC 2022 x86_64 x86_64 x86_64 GNU/Linux

All the other values were from jupyter server machine.

If you're unfamiliar with docker then try running your notebook on Amazon's Sagemaker or google colab https://colab.research.google.com/

scottee commented 1 year ago

Why do I need docker, since it was running on Linux?

akmalsoliev commented 1 year ago

Why do I need docker, since it was running on Linux?

is your Jupyter instance running on Amazon EC2 instance? can you provide a screenshot of what kernel you're using?

scottee commented 1 year ago

Yes, jupyter is running on AWS. Not sure which kernel are you referring to. The OS kernel is as shown in my "Doh!!" comment. Jupyter kernel is a "Python 3 (ipykernel)". Otherwise, lmk more details of which kernel you're after.

umitkaanusta commented 1 year ago

Doh!! MacOS is where I'm running the browser to jupyter. Jupyter server is running on Linux:

Linux xxx 5.4.219-126.411.amzn2.x86_64 #1 SMP Wed Nov 2 17:44:17 UTC 2022 x86_64 x86_64 x86_64 GNU/Linux

All the other values were from jupyter server machine.

If you're unfamiliar with docker then try running your notebook on Amazon's Sagemaker or google colab https://colab.research.google.com/

Had the same problem in a Colab notebook even if I set num_cores=1 with the Rossmann competition dataset (1M rows, 1k time-series but it perfectly fits the RAM w/ other models)

akmalsoliev commented 1 year ago

Doh!! MacOS is where I'm running the browser to jupyter. Jupyter server is running on Linux:


Linux xxx 5.4.219-126.411.amzn2.x86_64 #1 SMP Wed Nov 2 17:44:17 UTC 2022 x86_64 x86_64 x86_64 GNU/Linux

All the other values were from jupyter server machine.

If you're unfamiliar with docker then try running your notebook on Amazon's Sagemaker or google colab https://colab.research.google.com/

Had the same problem in a Colab notebook even if I set num_cores=1 with the Rossmann competition dataset (1M rows, 1k time-series but it perfectly fits the RAM w/ other models)

What are the set parameters? You have to use n_job=-1

umitkaanusta commented 1 year ago

Can you elaborate "set parameters"? Tried -1 as well which did not work

I initialized AutoARIMA as such: model = AutoARIMA(num_cores=-1, season_length=7)

akmalsoliev commented 1 year ago

@umitkaanusta try wrapping the model in Statsforecast and then proceeding from there. n_jobs=-1 works on my end.