Closed HuangChiEn closed 2 years ago
So I will assume your dataset is not so large as to cause problems. We can reduce the size of numpy datasets but pandas datasets are still not support. If it exceeds your total memory_limit=2048
then yes, the dummy will fail.
The short term solution is up the memory_limit
until it works. Not ideal but if you need results sooner rather than later than this is what I can suggest.
However i've seen two issues related to this now so I'm starting to think this may be something internal I did which is unfortunate as I have no clue what that may be.
Some bullet points:
import psutil; print(psutil.Process().memory_info())
in the process, just before you call fit
? The dependency is part of the autosklearn
stack so you shouldn't have to install anything.ulimits
are only relevant in the main process but when we train models, we use that memory_limit=2048
to set new ones for a spawn subprocess, i.e. the 1.73T
available memory you report won't do much.DUMMY
configuration can have a large memory footprint, before even considering the data or the the model itself.I need to look into this and see if there is anything in auto-sklearn
that specifically causes the process size to explode as I've seen it in tests too. I'll get back to you if I see anything.
This issue also appeared in #1453 for @belzheng, I'll report back what I find here
Hi @HuangChiEn and @belzheng,
I did some testing locally and a clean install of auto-sklearn only consumes about 900mb
of memory for me by the time _do_dummy_prediction
is called and you receive that error. Therefore I do not think this issue is on our side and we recommend reading our FAQ section to figure out what's going.
The recommended solution is still to increase the memory limit if you have a lot of packages in your setup or a lot of data.
For the future, we have some limited dataset reduction in place but this only applies to the training set in fit
and only applies to numpy only data. For pandas, we will look to AutoPytorch
as they recently had some solution there.
I handy debugging tool is to do import psutil; print(psutil.Process().memory_info().vms
to see your memory consumption at any point you like. This will give you memory consumption in bytes but you can convert quickly by doing x / (2**20)
.
I will close this issue as there's not much we can do but point to documentation. If you've tried these different approaches and have code that show it still does not work, please feel free to re-open.
Best, Eddie
🔥 For running the above code snippet, I have encountered the following Error :
[ERROR] [2022-05-05 02:47:20,752:Client-AutoML(1):ada19815-cc1d-11ec-8fbd-0242ac110004] Dummy prediction failed with run state StatusType.MEMOUT and additional output: {'error': 'Memout (used more than 2048 MB).', 'configuration_origin': 'DUMMY'}.
Traceback (most recent call last): File "auto_ml.py", line 185, in
Clf_trainer(x_train, y_train, cfg['training']['save_path'], cfg['training']['random_state'])
File "auto_ml.py", line 115, in Clf_trainer
clf.fit(tra_X, tray)
File "/opt/conda/lib/python3.8/site-packages/autosklearn/experimental/askl2.py", line 460, in fit
return super().fit(
File "/opt/conda/lib/python3.8/site-packages/autosklearn/estimators.py", line 1045, in fit
super().fit(
File "/opt/conda/lib/python3.8/site-packages/autosklearn/estimators.py", line 375, in fit
self.automl.fit(load_models=self.load_models, kwargs)
File "/opt/conda/lib/python3.8/site-packages/autosklearn/automl.py", line 2056, in fit
return super().fit(
File "/opt/conda/lib/python3.8/site-packages/autosklearn/automl.py", line 808, in fit
self.num_run += self._do_dummy_prediction(datamanager, num_run=1)
File "/opt/conda/lib/python3.8/site-packages/autosklearn/automl.py", line 476, in _do_dummy_prediction
raise ValueError(
ValueError: Dummy prediction failed with run state StatusType.MEMOUT and additional output: {'error': 'Memout (used more than 2048 MB).', 'configuration_origin': 'DUMMY'}.
🏴 OS related information :
In the code snippet, I also declare environment var
os.environ['OPENBLAS_NUM_THREADS'] = '4'
ulimit -a
core file size (blocks, -c) unlimited data seg size (kbytes, -d) unlimited scheduling priority (-e) 0 file size (blocks, -f) unlimited pending signals (-i) 7256749 max locked memory (kbytes, -l) 65536 max memory size (kbytes, -m) unlimited open files (-n) 1048576 pipe size (512 bytes, -p) 8 POSIX message queues (bytes, -q) 819200 real-time priority (-r) 0 stack size (kbytes, -s) 8192 cpu time (seconds, -t) unlimited max user processes (-u) unlimited virtual memory (kbytes, -v) unlimited file locks (-x) unlimited
memory info:
Although the default setting seems not the suggested setup, every suggestion of setup for
fit
function will be appreciate!!