Closed ZhouGuangP closed 1 year ago
Hi @ZhouGuangP,
The layer normalisation is included in the MultiModalFusion
module. A similar issue with nan
loss was reported in #29 and was resolved by using multiple GPUs. Were you using a single GPU?
Fred.
Yes, due to resource limitations, I only ran it on a single 4090 GPU with a batch size set to 16. Perhaps I can set the batch size to a maximum of 24. What other methods do I have to solve this problem without influencing the experimental accuracy?
I noticed in your paper published at ICCV2023, it is mentioned that Layer normalization is used before concatenating spatial and content features to avoid numerical overflow and ensure stable training. However, I couldn't find this part in the code. So, I always get the wrong with 'Hoi loss is nan'?