Chiaraplizz / ST-TR

Spatial Temporal Transformer Network for Skeleton-Based Activity Recognition
MIT License
294 stars 57 forks source link

Nan or Inf found in input Tensor. #5

Closed imj2185 closed 3 years ago

imj2185 commented 3 years ago

Hello,

I am having some issues in the middle of training. Until I was running around 20 epochs there was no problem. The loss and accuracy functions look normal until I got this warning

WARNING:root:NaN or Inf found in input tensor. WARNING:root:NaN or Inf found in input tensor. WARNING:root:NaN or Inf found in input tensor.

I don't think it is the dataset issue since it was running fine for the first 20 epochs...

If anyone had this issue, would you please explain what could potentially cause this?

Chiaraplizz commented 3 years ago

Hello,

I am having some issues in the middle of training. Until I was running around 20 epochs there was no problem. The loss and accuracy functions look normal until I got this warning

WARNING:root:NaN or Inf found in input tensor. WARNING:root:NaN or Inf found in input tensor. WARNING:root:NaN or Inf found in input tensor.

I don't think it is the dataset issue since it was running fine for the first 20 epochs...

If anyone had this issue, would you please explain what could potentially cause this?

Hello!

This is weird, I have never had this problem. Did anyone else face it?

Chiara

Chiaraplizz commented 3 years ago

Hello,

I am having some issues in the middle of training. Until I was running around 20 epochs there was no problem. The loss and accuracy functions look normal until I got this warning

WARNING:root:NaN or Inf found in input tensor. WARNING:root:NaN or Inf found in input tensor. WARNING:root:NaN or Inf found in input tensor.

I don't think it is the dataset issue since it was running fine for the first 20 epochs...

If anyone had this issue, would you please explain what could potentially cause this?

Hi! How many GPUs are you using? I found this happened when using only 1 GPU. I just fixed the code to solve the issue. Let me know if it works.

Chiara

wangpitao commented 3 years ago

Hello,

I am having some issues in the middle of training. Until I was running around 20 epochs there was no problem. The loss and accuracy functions look normal until I got this warning

WARNING:root:NaN or Inf found in input tensor. WARNING:root:NaN or Inf found in input tensor. WARNING:root:NaN or Inf found in input tensor.

I don't think it is the dataset issue since it was running fine for the first 20 epochs...

If anyone had this issue, would you please explain what could potentially cause this?

Which dataset do you use

JamesWang666 commented 3 years ago

Hello,

I still have the above warning when training on 1 GPU. This is so weird. Trying to train on 2 GPUs. Thanks.

wangpitao commented 3 years ago

you can turn base_lr down,then try it again

XIDIANPQZ commented 2 years ago

Hello,

I am having some issues in the middle of training. Until I was running around 20 epochs there was no problem. The loss and accuracy functions look normal until I got this warning

WARNING:root:NaN or Inf found in input tensor. WARNING:root:NaN or Inf found in input tensor. WARNING:root:NaN or Inf found in input tensor.

I don't think it is the dataset issue since it was running fine for the first 20 epochs...

If anyone had this issue, would you please explain what could potentially cause this?

I had the same issue. Did you solve it? I set the train batch size to 32 on 2 GPUs.

wangpitao commented 2 years ago

Hello, I am having some issues in the middle of training. Until I was running around 20 epochs there was no problem. The loss and accuracy functions look normal until I got this warning WARNING:root:NaN or Inf found in input tensor. WARNING:root:NaN or Inf found in input tensor. WARNING:root:NaN or Inf found in input tensor. I don't think it is the dataset issue since it was running fine for the first 20 epochs... If anyone had this issue, would you please explain what could potentially cause this?

I had the same issue. Did you solve it? I set the train batch size to 32 on 2 GPUs. 如果你使用的是最新的代码,并且同时使用ssa和tsa的话,建议调小学习率

XIDIANPQZ commented 2 years ago

Hello, I am having some issues in the middle of training. Until I was running around 20 epochs there was no problem. The loss and accuracy functions look normal until I got this warning WARNING:root:NaN or Inf found in input tensor. WARNING:root:NaN or Inf found in input tensor. WARNING:root:NaN or Inf found in input tensor. I don't think it is the dataset issue since it was running fine for the first 20 epochs... If anyone had this issue, would you please explain what could potentially cause this?

I had the same issue. Did you solve it? I set the train batch size to 32 on 2 GPUs. 如果你使用的是最新的代码,并且同时使用ssa和tsa的话,建议调小学习率

这次训练只用了SSA,比较奇怪的是到20个epoch才变成NAN