loss_value is NaN while training

akaraspt / deepsleepnet

DeepSleepNet: a Model for Automatic Sleep Stage Scoring based on Raw Single-Channel EEG

Apache License 2.0

398 stars 153 forks source link

loss_value is NaN while training #43

Closed cyberfag closed 3 years ago

cyberfag commented 3 years ago

Hi. During training the error occurs in _run_epoch of module trainer.py: "Model diverged with loss = NaN". What could it be due to?

akaraspt commented 3 years ago

@cyberfag There can be due to several issues. Probably start from the data. Have you verified whether the input signals and labels are loaded properly?

cyberfag commented 3 years ago

@akaraspt Sorry, should've posted several days ago. The issue was solved by switching Python and Cuda/cuDNN versions to 3.6.0, 10.0, and 7, respectively. Now I encountered new problems but it's not related so the thread should be closed.