Open scottee opened 1 year ago
One other config point... I had n_jobs=-1. I reduced n_jobs=1 and so far it hasn't grabbed such crazy amounts of memory.
One other config point... I had n_jobs=-1. I reduced n_jobs=1 and so far it hasn't grabbed such crazy amounts of memory.
Could you provide MRE?
I can't give you the example that caused the problem because of data. I haven't been able to reproduce it on toy examples. It always happens on the example with 90K rows and n_jobs=-1. Know of a large public dataset we both can get access to?
I can't give you the example that caused the problem because of data. I haven't been able to reproduce it on toy examples. It always happens on the example with 90K rows and n_jobs=-1. Know of a large public dataset we both can get access to?
Do you experience the same issue when generating a dummy dataset with from statsforecast.util import generate_series
?
Please provide the code of the Statsforecast and set parameters, additionally the list of models and their parameters.
OS is MacOS Ventura 13.3.1 Python is 3.9.12 Statsforecast is the latest version, but I don't know the number as my jupyter env is set up differently right now.
As for generate_series(), I've not used that before, but I can take a look. (Background: I inherited a notebook that encountered this mem problem, so I don't know much about statsforecast.)
models=[
sfm.AutoARIMA(season_length=12, alias='ARIMA'),
sfm.AutoARIMA(season_length=12, allowdrift=True, alias='ARIMA2'),
# Orig prob happened with all models, but still happened with just the two above.
#sfm.AutoCES(season_length=12, alias='CES'),
#sfm.SeasonalNaive(season_length=12, alias='SN'),
#sfm.SeasonalWindowAverage(season_length=12, window_size=3, alias='SWA'),
#sfm.RandomWalkWithDrift(alias='RWD'),
#sfm.HistoricAverage(alias='HA'),
],
fcster = StatsForecast(
models=models,
freq='M',
n_jobs=-1, # With 1 or 4, mem problem doesn't happen.
fallback_model=sfm.HistoricAverage(alias='HA'),
)
fcst_df = fcster.forecast(
df=train_df,
h=17,
fitted=True,
)
OS is MacOS Ventura 13.3.1 Python is 3.9.12 Statsforecast is the latest version, but I don't know the number as my jupyter env is set up differently right now.
As for generate_series(), I've not used that before, but I can take a look. (Background: I inherited a notebook that encountered this mem problem, so I don't know much about statsforecast.)
Ah knew that it was macOS, yeah sadly there is no solution to this, numba currently does not support multithreading on apple silicon. The only work around is spinning up a docker container and running your Jupyter Notebook from there.
Doh!! MacOS is where I'm running the browser to jupyter. Jupyter server is running on Linux:
Linux xxx 5.4.219-126.411.amzn2.x86_64 #1 SMP Wed Nov 2 17:44:17 UTC 2022 x86_64 x86_64 x86_64 GNU/Linux
All the other values were from jupyter server machine.
Doh!! MacOS is where I'm running the browser to jupyter. Jupyter server is running on Linux:
Linux xxx 5.4.219-126.411.amzn2.x86_64 #1 SMP Wed Nov 2 17:44:17 UTC 2022 x86_64 x86_64 x86_64 GNU/Linux
All the other values were from jupyter server machine.
If you're unfamiliar with docker then try running your notebook on Amazon's Sagemaker or google colab https://colab.research.google.com/
Why do I need docker, since it was running on Linux?
Why do I need docker, since it was running on Linux?
is your Jupyter instance running on Amazon EC2 instance? can you provide a screenshot of what kernel you're using?
Yes, jupyter is running on AWS. Not sure which kernel are you referring to. The OS kernel is as shown in my "Doh!!" comment. Jupyter kernel is a "Python 3 (ipykernel)". Otherwise, lmk more details of which kernel you're after.
Doh!! MacOS is where I'm running the browser to jupyter. Jupyter server is running on Linux:
Linux xxx 5.4.219-126.411.amzn2.x86_64 #1 SMP Wed Nov 2 17:44:17 UTC 2022 x86_64 x86_64 x86_64 GNU/Linux
All the other values were from jupyter server machine.
If you're unfamiliar with docker then try running your notebook on Amazon's Sagemaker or google colab https://colab.research.google.com/
Had the same problem in a Colab notebook even if I set num_cores=1
with the Rossmann competition dataset (1M rows, 1k time-series but it perfectly fits the RAM w/ other models)
Doh!! MacOS is where I'm running the browser to jupyter. Jupyter server is running on Linux:
Linux xxx 5.4.219-126.411.amzn2.x86_64 #1 SMP Wed Nov 2 17:44:17 UTC 2022 x86_64 x86_64 x86_64 GNU/Linux
All the other values were from jupyter server machine.
If you're unfamiliar with docker then try running your notebook on Amazon's Sagemaker or google colab https://colab.research.google.com/
Had the same problem in a Colab notebook even if I set
num_cores=1
with the Rossmann competition dataset (1M rows, 1k time-series but it perfectly fits the RAM w/ other models)
What are the set parameters? You have to use n_job=-1
Can you elaborate "set parameters"? Tried -1 as well which did not work
I initialized AutoARIMA as such:
model = AutoARIMA(num_cores=-1, season_length=7)
@umitkaanusta try wrapping the model in Statsforecast
and then proceeding from there. n_jobs=-1
works on my end.
I'm using the latest statsforecast and trying to run a forecast with 2 models and a dataset of 90K rows. The lib tries to grab 360gb of memory and crashes my notebook. Happened several times in a row with me trying various options with fewer models and fewer rows. Why is it trying to grab so much memory, and how can I avoid that?