NREL / alfabet

Machine learning predictions of bond dissociation energy
https://ml.nrel.gov/bde/
Other
57 stars 17 forks source link

There may be memory leaks. #14

Open LuckyLittleMonster opened 2 years ago

LuckyLittleMonster commented 2 years ago

I need to invoke model.predict() a lot of times. But it seems that there is a memory leak in predict(). This code gives increasing memory usage.

from alfabet import model

smiles = 'O=C(O)CCC/C=C/CC1C2CCC(O2)C1CCNC(=O)Nc1ccccc1'
while True:
    model.predict([smiles])

alfabet_memory_usage

pstjohn commented 2 years ago

Very odd, thanks for the report! I didn't think that was something I needed to worry about in python 😄. Would that trigger an OOM error eventually? I wonder if it's inherited from one of the underlying C libraries. Do you get the same result running

while True:
    rdkit.Chem.MolFromSmiles(smiles)
LuckyLittleMonster commented 2 years ago

Yes, my code crashed after 4-5 hours. I also think the memory leak is related to C libraries. I will test your code and check if I got the same result.

pstjohn commented 2 years ago

does the memory leak also occur if you don't call model.predict() in a loop? it should be faster anyways to enable some batching over the input molecules. model.predict(my_smiles_list, batch_size=128, verbose=True) for instance, might give better performance, especially on a GPU

LuckyLittleMonster commented 2 years ago
while True:
    rdkit.Chem.MolFromSmiles(smiles)

This code don't have memory leak. There is no memory leak if I don't call predict(). Batching should be faster, but we have to calculate the molecules one by one for now.

den-run-ai commented 2 years ago

@LuckyLittleMonster can you try to disable caching on this line?

https://github.com/NREL/alfabet/blob/master/alfabet/model.py#L68

LuckyLittleMonster commented 2 years ago

@denfromufa I disabled that line. But the used memory is still increasing. used_memory_no_cache

LuckyLittleMonster commented 2 years ago

Some memory leaks are related to tf.data.Dataset.from_generator(). https://github.com/tensorflow/tensorflow/issues/37653