fredzzhang / pvic

[ICCV'23] Official PyTorch implementation for paper "Exploring Predicate Visual Context in Detecting Human-Object Interactions"
BSD 3-Clause "New" or "Revised" License
68 stars 8 forks source link

The problem in your code??? #31

Closed ZhouGuangP closed 1 year ago

ZhouGuangP commented 1 year ago

I noticed in your paper published at ICCV2023, it is mentioned that Layer normalization is used before concatenating spatial and content features to avoid numerical overflow and ensure stable training. However, I couldn't find this part in the code. So, I always get the wrong with 'Hoi loss is nan'? image

fredzzhang commented 1 year ago

Hi @ZhouGuangP,

The layer normalisation is included in the MultiModalFusion module. A similar issue with nan loss was reported in #29 and was resolved by using multiple GPUs. Were you using a single GPU?

Fred.

ZhouGuangP commented 1 year ago

Yes, due to resource limitations, I only ran it on a single 4090 GPU with a batch size set to 16. Perhaps I can set the batch size to a maximum of 24. What other methods do I have to solve this problem without influencing the experimental accuracy?