aimagelab / meshed-memory-transformer

Meshed-Memory Transformer for Image Captioning. CVPR 2020
BSD 3-Clause "New" or "Revised" License
517 stars 136 forks source link

Memory issue during RL optimization #56

Closed gpantaz closed 1 year ago

gpantaz commented 3 years ago

Hello,

Many thanks for releasing the repo. I am trying to train a model on a custom variation of MSCOCO though I keep the train/test/valid sizes equal to the Karpathy split. I have no issue training a model without RL optimization. However, I have noticed that during each epoch in RL optimization the required memory increases. I am training the model on RTX-2080. Each epoch lasts approximately 3-4 hours and occasionally run out of memory. I tried to see if there are any additional accumulated allocations from epoch to epoch. Is this expected?

Thank you :)

amazingYX commented 2 years ago

I meet the same problem, have you ever solved this issue? Could you please tell me how to overcome this?

gpantaz commented 2 years ago

Hello, sadly no. I was allocating my resources on different experiments to speed up the process. I ended up running 1 experiment at a time :/

luo3300612 commented 2 years ago

try to add tokenizer_pool.close() at the end of the function train_scst