HuiZhang0812 / DiffusionAD

148 stars 16 forks source link

python train.py Loss not decreasing #26

Closed cclamd closed 9 months ago

cclamd commented 9 months ago

hi ,i add the data according to the readme document ,but when i run python train.py it shows

class screw args1.json defaultdict(<class 'str'>, {'img_size': [256, 256], 'Batch_Size': 2, 'EPOCHS': 300, 'T': 1000, 'base_channels': 128, 'beta_schedule': 'linear', 'loss_type': 'l2', 'diffusion_lr': 0.0001, 'seg_lr': 1e-05, 'random_slice': True, 'weight_decay': 0.0, 'save_imgs': True, 'save_vids': False, 'dropout': 0, 'attention_resolutions': '32,16,8', 'num_heads': 4, 'num_head_channels': -1, 'noise_fn': 'gauss', 'channels': 3, 'mvtec_root_path': '/content/drive/MyDrive/DiffusionAD/datasets/mvtec', 'visa_root_path': 'datasets/VisA_1class/1cls', 'dagm_root_path': 'datasets/dagm', 'mpdd_root_path': 'datasets/mpdd', 'anomaly_source_path': '/content/drive/MyDrive/DiffusionAD/datasets/dtd', 'noisier_t_range': 600, 'less_t_range': 300, 'condition_w': 1, 'eval_normal_t': 200, 'eval_noisier_t': 400, 'output_path': 'outputs', 'arg_num': '1'}) /usr/local/lib/python3.10/dist-packages/torch/utils/data/dataloader.py:557: UserWarning: This DataLoader will create 8 worker processes in total. Our suggested max number of worker in current system is 2, which is smaller than what this DataLoader is going to create. Please be aware that excessive worker creation might get DataLoader running slow or even freeze, lower the worker number to avoid potential slowness/freeze if necessary. warnings.warn(_create_warning_msg( /usr/local/lib/python3.10/dist-packages/torch/utils/data/dataloader.py:557: UserWarning: This DataLoader will create 4 worker processes in total. Our suggested max number of worker in current system is 2, which is smaller than what this DataLoader is going to create. Please be aware that excessive worker creation might get DataLoader running slow or even freeze, lower the worker number to avoid potential slowness/freeze if necessary. warnings.warn(_create_warning_msg( Epoch:0, Train loss: nan: 1% 1/160 [00:04<12:14, 4.62s/it]thresh_path /content/drive/MyDrive/DiffusionAD/datasets/mvtec/screw/DISthresh/good/309.png image_path /content/drive/MyDrive/DiffusionAD/datasets/mvtec/screw/train/good/309.png Epoch:0, Train loss: nan: 1% 2/160 [00:06<08:03, 3.06s/it]thresh_path /content/drive/MyDrive/DiffusionAD/datasets/mvtec/screw/DISthresh/good/151.png image_path /content/drive/MyDrive/DiffusionAD/datasets/mvtec/screw/train/good/151.png thresh_path /content/drive/MyDrive/DiffusionAD/datasets/mvtec/screw/DISthresh/good/023.png image_path /content/drive/MyDrive/DiffusionAD/datasets/mvtec/screw/train/good/023.png thresh_path /content/drive/MyDrive/DiffusionAD/datasets/mvtec/screw/DISthresh/good/180.png image_path /content/drive/MyDrive/DiffusionAD/datasets/mvtec/screw/train/good/180.png Epoch:0, Train loss: nan: 2% 3/160 [00:08<06:13, 2.38s/it]thresh_path /content/drive/MyDrive/DiffusionAD/datasets/mvtec/screw/DISthresh/good/015.png image_path /content/drive/MyDrive/DiffusionAD/datasets/mvtec/screw/train/good/015.png thresh_path /content/drive/MyDrive/DiffusionAD/datasets/mvtec/screw/DISthresh/good/292.png image_path /content/drive/MyDrive/DiffusionAD/datasets/mvtec/screw/train/good/292.png Epoch:0, Train loss: nan: 2% 4/160 [00:09<05:21, 2.06s/it]thresh_path /content/drive/MyDrive/DiffusionAD/datasets/mvtec/screw/DISthresh/good/113.png image_path /content/drive/MyDrive/DiffusionAD/datasets/mvtec/screw/train/good/113.png thresh_path /content/drive/MyDrive/DiffusionAD/datasets/mvtec/screw/DISthresh/good/152.png image_path /content/drive/MyDrive/DiffusionAD/datasets/mvtec/screw/train/good/152.png

i print the data path of the image_path and thresh_path ,the path is right,but why loss can't decrease

HuiZhang0812 commented 9 months ago

The default configuration for batch size is 16. A small batch size, such as the one you set to 2, may lead to an entire batch consisting solely of abnormal samples, thereby affecting the calculation of the paper's loss formula (Formula 9).

cclamd commented 9 months ago

ok, thanks ,so which value should i set for the min of batch size to decrease the loss, should i have to set it to 16 ?

https://github.com/HuiZhang0812/DiffusionAD/issues/16

HuiZhang0812 commented 9 months ago

If your GPU RAM is sufficiently large, setting the batch size to 16 is recommended.

cclamd commented 9 months ago

thanks , i try some value and find "batch size =6 " is the min value

FireShot Capture 016 - DiffusionAD ipynb - Colaboratory - colab research google com