Memory issue during RL optimization

gpantaz commented 3 years ago

Hello,

Many thanks for releasing the repo. I am trying to train a model on a custom variation of MSCOCO though I keep the train/test/valid sizes equal to the Karpathy split. I have no issue training a model without RL optimization. However, I have noticed that during each epoch in RL optimization the required memory increases. I am training the model on RTX-2080. Each epoch lasts approximately 3-4 hours and occasionally run out of memory. I tried to see if there are any additional accumulated allocations from epoch to epoch. Is this expected?

Thank you :)

amazingYX commented 2 years ago

I meet the same problem, have you ever solved this issue? Could you please tell me how to overcome this?

gpantaz commented 2 years ago

Hello, sadly no. I was allocating my resources on different experiments to speed up the process. I ended up running 1 experiment at a time :/

luo3300612 commented 2 years ago

try to add tokenizer_pool.close() at the end of the function train_scst

aimagelab / meshed-memory-transformer

Memory issue during RL optimization #56