DonaldRR / SimpleNet

MIT License
402 stars 59 forks source link

Code Doesn't Work: It gets killed while inferring #60

Closed mericgeren closed 6 months ago

mericgeren commented 6 months ago

When I run the run.sh script, I see that it completes the training and begin the inferrence. But, after a while it starts the inferring, it gets killed suddenly and I see a terminal output just like this:

Matplotlib created a temporary cache directory at /tmp/matplotlib-rp7dlh3f because the default path (/home/username/.config/matplotlib) is not a writable directory; it is highly recommended to set the MPLCONFIGDIR environment variable to a writable directory, in particular to speed up the import of Matplotlib and to better support multiprocessing.
INFO:__main__:Command line arguments: main.py --gpu 0 --seed 0 --log_group simplenet_mvtec --log_project MVTecAD_Results --results_path /home/username/ml_ai_cv_workspace/SimpleNet/results --run_name run net -b wideresnet50 -le layer2 -le layer3 --pretrain_embed_dimension 1536 --target_embed_dimension 1536 --patchsize 3 --meta_epochs 40 --embedding_size 256 --gan_epochs 4 --noise_std 0.015 --dsc_hidden 1024 --dsc_layers 2 --dsc_margin .5 --pre_proj 1 dataset --batch_size 8 --resize 329 --imagesize 288 -d screw -d pill -d capsule -d carpet -d grid -d tile -d wood -d zipper -d cable -d toothbrush -d transistor -d metal_nut -d bottle -d hazelnut -d leather mvtec /home/username/mvtech_dataset
INFO:__main__:Dataset: train=320 test=160
INFO:__main__:Dataset: train=267 test=167
INFO:__main__:Dataset: train=219 test=132
INFO:__main__:Dataset: train=280 test=117
INFO:__main__:Dataset: train=264 test=78
INFO:__main__:Dataset: train=230 test=117
INFO:__main__:Dataset: train=247 test=79
INFO:__main__:Dataset: train=240 test=151
INFO:__main__:Dataset: train=224 test=150
INFO:__main__:Dataset: train=60 test=42
INFO:__main__:Dataset: train=213 test=100
INFO:__main__:Dataset: train=220 test=115
INFO:__main__:Dataset: train=209 test=83
INFO:__main__:Dataset: train=391 test=110
INFO:__main__:Dataset: train=245 test=124
INFO:__main__:Evaluating dataset [mvtec_screw] (1/15)...
/home/username/.local/lib/python3.8/site-packages/torchvision/models/_utils.py:208: UserWarning: The parameter 'pretrained' is deprecated since 0.13 and may be removed in the future, please use 'weights' instead.
  warnings.warn(
/home/username/.local/lib/python3.8/site-packages/torchvision/models/_utils.py:223: UserWarning: Arguments other than a weight enum or `None` for 'weights' are deprecated since 0.13 and may be removed in the future. The current behavior is equivalent to passing `weights=Wide_ResNet50_2_Weights.IMAGENET1K_V1`. You can also use `weights=Wide_ResNet50_2_Weights.DEFAULT` to get the most up-to-date weights.
  warnings.warn(msg)
INFO:__main__:Training models (1/1)
INFO:simplenet:Training discriminator...
epoch:3 loss:0.45132 lr:0.0002 p_true:0.45 p_fake:0.475: 100%|████████████████████████████| 4/4 [00:34<00:00,  8.57s/it]
Inferring...:  20%|██████████████                                                        | 4/20 [00:31<02:54, 10.92s/it]run.sh: line 30:  1620 Killed                  python3 main.py --gpu 0 --seed 0 --log_group simplenet_mvtec --log_project MVTecAD_Results --results_path /home/username/ml_ai_cv_workspace/SimpleNet/results --run_name run net -b wideresnet50 -le layer2 -le layer3 --pretrain_embed_dimension 1536 --target_embed_dimension 1536 --patchsize 3 --meta_epochs 40 --embedding_size 256 --gan_epochs 4 --noise_std 0.015 --dsc_hidden 1024 --dsc_layers 2 --dsc_margin .5 --pre_proj 1 dataset --batch_size 8 --resize 329 --imagesize 288 "${dataset_flags[@]}" mvtec $datapath

Environment