aksnzhy / xlearn

High performance, easy-to-use, and scalable machine learning (ML) package, including linear model (LR), factorization machines (FM), and field-aware factorization machines (FFM) for Python and CLI interface.
https://xlearn-doc.readthedocs.io/en/latest/index.html
Apache License 2.0
3.09k stars 519 forks source link

Memory leaks during training? #323

Open HenryNebula opened 4 years ago

HenryNebula commented 4 years ago

Hi authors,

First, I'd like thank you all for providing us this useful package! Recently, I have been trying to use the FMModel, but it seems that the model uses more and more memory during each time I call the fit function. I am doing cross-validation on my own datasets, so I need to retrain a new model for every new fold. However, it seems that the old models are not deleted every time a new model is generated and trained and it ends up using all the free memory and throws an OOM error.

An example of showing this would be add a for loop to the example_FM_wine.py, like

for _ in range(100):
    fm_model = xl.FMModel(task='binary', init=0.1,
                          epoch=100, k=4, lr=0.1,
                          reg_lambda=0.01, opt='sgd',
                          metric='acc')
    # Start to train
    fm_model.fit(X_train,
                 y_train,
                 eval_set=[X_val, y_val],
                 is_quiet=True)

And by using psutil to track the memory usage, like

process = psutil.Process(os.getpid())
print(process.memory_info().rss // 1024) # in KB

the memory it used raises from ~85M to ~93M, even with the del fm_model command. So I am wondering is there a memory leak happening here? Thanks!