When I run train.py, train_loss becomes nan. Is this problem normal?

boxbox2 commented 8 months ago

class carpet
args1.json defaultdict(<class 'str'>, {'img_size': [256, 256], 'Batch_Size': 4, 'EPOCHS': 3000, 'T': 1000, 'base_channels': 128, 'beta_schedule': 'linear', 'loss_type': 'l2', 'diffusion_lr': 0.0001, 'seg_lr': 1e-05, 'random_slice': True, 'weight_decay': 0.0, 'save_imgs': True, 'save_vids': False, 'dropout': 0, 'attention_resolutions': '32,16,8', 'num_heads': 4, 'num_head_channels': -1, 'noise_fn': 'gauss', 'channels': 3, 'mvtec_root_path': 'datasets/mvtec', 'visa_root_path': 'datasets/VisA/visa', 'dagm_root_path': 'datasets/dagm', 'mpdd_root_path': 'datasets/mpdd', 'anomaly_source_path': 'datasets/dtd', 'noisier_t_range': 600, 'less_t_range': 300, 'condition_w': 1, 'eval_normal_t': 200, 'eval_noisier_t': 400, 'output_path': 'outputs', 'arg_num': '1'})
Epoch:0, Train loss: nan:

train_loss increases by two points from 2.03 to 4.23, sometimes becoming nan when it increases to 10, and sometimes becoming nan when it reaches 20

boxbox2 commented 8 months ago

微信图片_20240130143251

HuiZhang0812 commented 8 months ago

Please see #29 for more details.

HuiZhang0812 commented 8 months ago

When the batch size is small, it may lead to an entire batch consisting solely of abnormal samples, thereby affecting the calculation of the paper's loss formula (Formula 9). We have fixed this. We have fixed this bug. Please try again.

HuiZhang0812 / DiffusionAD

When I run train.py, train_loss becomes nan. Is this problem normal? #32