LoveSiameseCat / MoE-FFD

16 stars 1 forks source link

When I am training your model, the output feature map is often Inf. Why? #3

Open Elijah-Yi opened 2 weeks ago

Elijah-Yi commented 2 weeks ago

thanks for your great work, When I am training your model, the output feature map is often Inf. Why?

LoveSiameseCat commented 2 weeks ago

Do you align with the version of the packages provided in the README? I tested this repo before uploading it online, so I believe it should work well. According to my experience, If zero value appears in the denominator, it may cause an infinity value after backpropagation. U can check this issue by using “torch.isinf(tensor).any()”.

Elijah-Yi commented 2 weeks ago

I'm using the latest one. I found out that there was an Inf problem when executing self.proj in the Attention class that caused the Nan problem. Do you have any good ways to solve this problem?

LoveSiameseCat commented 2 weeks ago

The best way is to downgrade your PyTorch version. I suspect the NaN problem in the Attention class may be caused by a specific input, so another solution you can try is changing the random seed and checking if the time when the NaN issue occurs changes. If it does, you'll need to identify the specific input.