autogluon / autogluon

Fast and Accurate ML in 3 Lines of Code
https://auto.gluon.ai/
Apache License 2.0
7.79k stars 910 forks source link

[BUG] Inconsistent memory usage after calling successive predicts(...) #3123

Closed MrWaggel closed 3 months ago

MrWaggel commented 1 year ago

Describe the bug After running TabularPredictor.predict(...) successively serially not parallel, memory isn't being freed in a consistent manner. On production server with limited memory capacity it hits OOM quite fast.

In my case after I run prediction on one row of 248 bytes of data.

note Not that familiar with python, so I might be doing something completely wrong. In any case if someone could point me in the right direction on how to solve this.

``` Line # Mem usage Increment Occurrences Line Contents ============================================================= 23 166.6 MiB 166.6 MiB 1 @profile 24 def predictGluon(info): 25 166.6 MiB 0.0 MiB 1 hash = info["hash"] 26 166.6 MiB 0.0 MiB 1 predictor = TabularPredictor.load("AutogluonModels/"+hash+"/") 27 166.6 MiB 0.0 MiB 1 predictor.unpersist_models() # Doesn't do much 28 29 166.6 MiB 0.0 MiB 1 data = info["data"] 30 166.6 MiB 0.0 MiB 1 data2 = pd.DataFrame(data[1:], columns=data[0]) 31 1282.7 MiB 1116.1 MiB 1 result = predictor.predict(data2) 32 1282.9 MiB 0.2 MiB 1 js = result.to_json() # < 1MB 33 34 1282.9 MiB 0.0 MiB 1 del result, predictor 35 1281.4 MiB -1.5 MiB 1 gc.collect() 36 1281.4 MiB 0.0 MiB 1 return js ``` After x arbitrary number of successive calls the memory usage increments after `.predict(...)` ``` Line # Mem usage Increment Occurrences Line Contents ============================================================= 23 1331.8 MiB 1331.8 MiB 1 @profile 24 def predictGluon(info): 25 1331.8 MiB 0.0 MiB 1 hash = info["hash"] 26 1331.8 MiB 0.0 MiB 1 predictor = TabularPredictor.load("AutogluonModels/"+hash+"/") 27 1331.8 MiB 0.0 MiB 1 predictor.unpersist_models() # Doesn't do much 28 29 1331.8 MiB 0.0 MiB 1 data = info["data"] 30 1331.8 MiB 0.0 MiB 1 data2 = pd.DataFrame(data[1:], columns=data[0]) 31 2219.2 MiB 887.3 MiB 1 result = predictor.predict(data2) 32 2219.2 MiB 0.0 MiB 1 js = result.to_json() # < 1MB 33 34 2219.2 MiB 0.0 MiB 1 del result, predictor 35 2219.2 MiB 0.0 MiB 1 gc.collect() 36 2219.2 MiB 0.0 MiB 1 return js Line # Mem usage Increment Occurrences Line Contents ============================================================= 23 3415.7 MiB 3415.7 MiB 1 @profile 24 def predictGluon(info): 25 3415.7 MiB 0.0 MiB 1 hash = info["hash"] 26 3415.7 MiB 0.0 MiB 1 predictor = TabularPredictor.load("AutogluonModels/"+hash+"/") 27 3415.7 MiB 0.0 MiB 1 predictor.unpersist_models() # Doesn't do much 28 29 3415.7 MiB 0.0 MiB 1 data = info["data"] 30 3415.7 MiB 0.0 MiB 1 data2 = pd.DataFrame(data[1:], columns=data[0]) 31 4297.8 MiB 882.2 MiB 1 result = predictor.predict(data2) 32 4297.8 MiB 0.0 MiB 1 js = result.to_json() # < 1MB 33 34 4297.8 MiB 0.0 MiB 1 del result, predictor 35 3288.9 MiB -1008.9 MiB 1 gc.collect() 36 3288.9 MiB 0.0 MiB 1 return js Line # Mem usage Increment Occurrences Line Contents ============================================================= 23 2845.2 MiB 2845.2 MiB 1 @profile 24 def predictGluon(info): 25 2845.2 MiB 0.0 MiB 1 hash = info["hash"] 26 2845.2 MiB 0.0 MiB 1 predictor = TabularPredictor.load("AutogluonModels/"+hash+"/") 27 2845.2 MiB 0.0 MiB 1 predictor.unpersist_models() # Doesn't do much 28 29 2845.2 MiB 0.0 MiB 1 data = info["data"] 30 2845.2 MiB 0.0 MiB 1 data2 = pd.DataFrame(data[1:], columns=data[0]) 31 3732.0 MiB 886.8 MiB 1 result = predictor.predict(data2) 32 3732.0 MiB 0.0 MiB 1 js = result.to_json() # < 1MB 33 34 3732.0 MiB 0.0 MiB 1 del result, predictor 35 3732.0 MiB 0.0 MiB 1 gc.collect() 36 3732.0 MiB 0.0 MiB 1 return js ``` This gets repeated until python goes OOM.

Expected behavior Return allocated memory for garbage collection. No dangling references in other threads for garbage collection?

Installed Versions

```python INSTALLED VERSIONS ------------------ date : 2023-04-10 time : 13:59:38.343162 python : 3.9.5.final.0 OS : Linux OS-release : 5.4.0-146-generic Version : #163-Ubuntu SMP Fri Mar 17 18:26:02 UTC 2023 machine : x86_64 processor : x86_64 num_cores : 4 cpu_ram_mb : 15906 cuda version : 11.460.27.04 num_gpus : 1 gpu_ram_mb : [580] avail_disk_size_mb : 25775 accelerate : 0.16.0 autogluon.common : 0.7.0 autogluon.core : 0.7.0 autogluon.features : 0.7.0 autogluon.multimodal : 0.7.0 autogluon.tabular : 0.7.0 autogluon.timeseries : 0.7.0 boto3 : 1.26.107 catboost : 1.1.1 defusedxml : 0.7.1 evaluate : 0.3.0 fairscale : 0.4.13 fastai : 2.7.12 gluonts : 0.12.6 hyperopt : 0.2.7 jinja2 : 3.1.2 joblib : 1.2.0 jsonschema : 4.17.3 lightgbm : 3.3.5 matplotlib : 3.7.1 networkx : 2.8.8 nlpaug : 1.1.11 nltk : 3.8.1 nptyping : 2.4.1 numpy : 1.23.5 omegaconf : 2.2.3 openmim : None pandas : 1.5.3 PIL : 9.5.0 psutil : 5.9.4 pytesseract : 0.3.10 pytorch-metric-learning: None pytorch_lightning : 1.9.4 ray : 2.2.0 requests : 2.22.0 scipy : 1.10.1 sentencepiece : 0.1.97 seqeval : None setuptools : 67.6.1 skimage : 0.19.3 sklearn : 1.2.2 statsforecast : 1.4.0 statsmodels : 0.13.5 tensorboard : 2.12.1 text-unidecode : None timm : 0.6.13 torch : 1.13.1+cpu torchmetrics : 0.8.2 torchvision : 0.14.1+cpu tqdm : 4.65.0 transformers : 4.26.1 ujson : 5.7.0 xgboost : 1.7.5 ```

Additional context Selected model was trained with preset high_quality (bagged/stacked).

Innixma commented 3 months ago

Unsure why memory increases, but you could instead call predictor.persist() and keep the predictor in memory, that would also speed up the inference calls significantly.

Please open a new issue referencing this one if the issue persists in the latest version of AutoGluon (v1.1+)