question about CUDA memory for SCST

232525 / PureT

Implementation of 'End-to-End Transformer Based Model for Image Captioning' [AAAI 2022]

63 stars 12 forks source link

question about CUDA memory for SCST #18

Open Lujinfu1999 opened 1 year ago

Lujinfu1999 commented 1 year ago

Hello！Author! I want to reproduce your experimental code，but when i run code for SCST, it occures 'cuda out of memory'. I want to know how much CUDA memory is needed for SCST，could you tell me? Thank you!

232525 commented 1 year ago

It seems about 4000M+ memory under Tesla V100 32G GPU if I remember correctly. You can store the image feats of training set to reduce the CUDA memory usage.

Lujinfu1999 commented 1 year ago

Thank you very much! I use ruotian luo's code [ImageCaptioning.pytorch(https://github.com/ruotianluo/ImageCaptioning.pytorch) and use swin-transformer instade of bottom-up feature when train, and it can run about 9G memory for SCST. But i use PureT for SCST it takes more than 10G CUDA memory,does it has some diffences when SCST traning?Could you tell me if you know it!Thank you for your patience!

232525 commented 1 year ago

Sorry I am not sure why. If you just train the model using pre-extracted Swin feats, the CUDA memory should not be too high. And I remember wrong, 4000M+ is the training under XE. I just tried it again, the SCST training is about 10000M+. If deleting the backbone of Swin-Transformer model and directly training using the image feats as input, the CUDA memory should be lesser enough, just about 5000M.

Lujinfu1999 commented 1 year ago

Thanks a lot! I will try your advice for training.Thank you very much for your patience again!

Lujinfu1999 commented 1 year ago

Dear Author! Sorry to bother you! I have tried your suggestion and used swin-transformer to extract image features, but it got 2-3 CIDER points lower than use image just in XE training. What's more strange is that it got same score in several experiments. I modificated [coco_dataset.py] and [data_loader.py] when reading features and delete the backbone, I not sure where the error occurred.Could you share your code about using pre-extracted Swin-feats(mainly coco_dataset.py and data_loader.py) if it's convenient for you. Thank you very much!