issue with the code - Githubissues

lircsszz commented 2 years ago

Hi I'm running training process using unet.bash, but during training a bug was encountered. the unet.bash is like: CUDA_VISIBLE_DEVICES=0 python3 main.py --mode train --train_file lists/training.txt \ --valid_file lists/validation.txt \ --test_file lists/testing.txt --batch_size 16 --sample_length 1 \ --total_length 1 --number_of_crops 1 --buffer_size 100 --exp_name uvae-test1-0810 --learning_rate 0.0008 \ --checkpoint_dir ./data/checkpoints/ --model UNet --datatype outdoor --num_epochs 300 \ --num_class 10 --block_size 1 --probability 0

and the error is like: AttributeError: 'NoneType' object has no attribute 'log_scalar'

and I trace this error to: ` self.logger.log_scalar('l2 loss', self.losslmse could you help me with it? thanks alot

lircsszz commented 2 years ago

It seems losslmse never be culculated

ValentinaSanguineti commented 2 years ago

Hi I am not sure about the issue. Have you found in which script in "trainer" folder is the lossmse not calculated so that I can check? Probably that is an old bash and then the trainers were modified so that it has not proper arguments. Try with unetacresnet.bash and let me know if that bash is running. Kind regards, Valentina

lircsszz commented 2 years ago

Hi I am not sure about the issue. Have you found in which script in "trainer" folder is the lossmse not calculated so that I can check? Probably that is an old bash and then the trainers were modified so that it has not proper arguments. Try with unetacresnet.bash and let me know if that bash is running. Kind regards, Valentina

The unetacresnet.bash script works, but there are still some problems, such as the loss does not drop, and there is no way to run the unet model successfully. That's okay though, one of the things I worked on before was enhancing the representation between the VAE model input data and the latent space features, making the latent space features encode to more information. Your UVAE work is great and has inspired me a lot, thanks a lot.

2022-08-11 10:00:35.915105: 0811unet - Iteration: [284] Training_Loss: 15.671988 Training_Accuracy: 0.187500 2022-08-11 10:00:37.238345: 0811unet - Iteration: [285] Training_Loss: 15.672263 Training_Accuracy: 0.125000 2022-08-11 10:00:38.678872: 0811unet - Iteration: [286] Training_Loss: 15.671172 Training_Accuracy: 0.187500 2022-08-11 10:00:39.952024: 0811unet - Iteration: [287] Training_Loss: 15.671836 Training_Accuracy: 0.125000 2022-08-11 10:00:41.294306: 0811unet - Iteration: [288] Training_Loss: 15.672267 Training_Accuracy: 0.062500 2022-08-11 10:00:42.730300: 0811unet - Iteration: [289] Training_Loss: 15.671895 Training_Accuracy: 0.000000 2022-08-11 10:00:44.025725: 0811unet - Iteration: [290] Training_Loss: 15.671336 Training_Accuracy: 0.062500 2022-08-11 10:00:45.326138: 0811unet - Iteration: [291] Training_Loss: 15.671785 Training_Accuracy: 0.062500 2022-08-11 10:00:46.718289: 0811unet - Iteration: [292] Training_Loss: 15.671815 Training_Accuracy: 0.125000 2022-08-11 10:00:48.119745: 0811unet - Iteration: [293] Training_Loss: 15.673264 Training_Accuracy: 0.000000 2022-08-11 10:00:49.482745: 0811unet - Iteration: [294] Training_Loss: 15.672828 Training_Accuracy: 0.000000 2022-08-11 10:00:50.744331: 0811unet - Iteration: [295] Training_Loss: 15.672508 Training_Accuracy: 0.000000 2022-08-11 10:00:51.860722: 0811unet - Iteration: [296] Training_Loss: 15.671854 Training_Accuracy: 0.187500 2022-08-11 10:00:53.275444: 0811unet - Iteration: [297] Training_Loss: 15.671474 Training_Accuracy: 0.187500 2022-08-11 10:00:54.596415: 0811unet - Iteration: [298] Training_Loss: 15.671822 Training_Accuracy: 0.187500 2022-08-11 10:00:55.850015: 0811unet - Iteration: [299] Training_Loss: 15.672259 Training_Accuracy: 0.125000 2022-08-11 10:00:57.190248: 0811unet - Iteration: [300] Training_Loss: 15.672073 Training_Accuracy: 0.062500 2022-08-11 10:00:58.397464: 0811unet - Iteration: [301] Training_Loss: 15.670971 Training_Accuracy: 0.125000 2022-08-11 10:00:59.843439: 0811unet - Iteration: [302] Training_Loss: 15.671003 Training_Accuracy: 0.125000 2022-08-11 10:01:01.183775: 0811unet - Iteration: [303] Training_Loss: 15.672009 Training_Accuracy: 0.062500 2022-08-11 10:01:02.568894: 0811unet - Iteration: [304] Training_Loss: 15.672955 Training_Accuracy: 0.000000 2022-08-11 10:01:04.025399: 0811unet - Iteration: [305] Training_Loss: 15.672600 Training_Accuracy: 0.000000 2022-08-11 10:01:05.355327: 0811unet - Iteration: [306] Training_Loss: 15.672375 Training_Accuracy: 0.062500 2022-08-11 10:01:06.746954: 0811unet - Iteration: [307] Training_Loss: 15.671938 Training_Accuracy: 0.062500 2022-08-11 10:01:07.968467: 0811unet - Iteration: [308] Training_Loss: 15.671972 Training_Accuracy: 0.125000 2022-08-11 10:01:09.340287: 0811unet - Iteration: [309] Training_Loss: 15.671703 Training_Accuracy: 0.187500 2022-08-11 10:01:10.656609: 0811unet - Iteration: [310] Training_Loss: 15.671160 Training_Accuracy: 0.187500 2022-08-11 10:01:11.924044: 0811unet - Iteration: [311] Training_Loss: 15.672188 Training_Accuracy: 0.062500 2022-08-11 10:01:13.283802: 0811unet - Iteration: [312] Training_Loss: 15.671762 Training_Accuracy: 0.125000 2022-08-11 10:01:14.583009: 0811unet - Iteration: [313] Training_Loss: 15.671877 Training_Accuracy: 0.187500 2022-08-11 10:01:15.784425: 0811unet - Iteration: [314] Training_Loss: 15.671447 Training_Accuracy: 0.125000 2022-08-11 10:01:17.181165: 0811unet - Iteration: [315] Training_Loss: 15.672251 Training_Accuracy: 0.062500

ValentinaSanguineti commented 2 years ago

The parameters have been set to work well on my dataset. Try to reduce the learning rate to 10^-5, change batch size. You can also add 1 or 2 skip connections. If you need latent space features encode to more information, increase the latent_loss parameter. Maybe you need to modify the loss, or change the number of layers.

IIT-PAVIS / Acoustic-Image-Generation

issue with the code #3