apple / ml-cvnets

CVNets: A library for training computer vision networks
https://apple.github.io/ml-cvnets
Other
1.76k stars 225 forks source link

ERROR - total_loss not present in the dictionary. Available keys are: []. Exiting!!! #104

Open 111hyq111 opened 5 months ago

111hyq111 commented 5 months ago

Training with my own dataset appear error: 2024-04-16 18:56:22 - DEBUG - Training epoch 0 with 0 samples File "/home/hyq/anaconda3/envs/cvnets/bin/cvnets-train", line 8, in sys.exit(main_worker()) File "/home/hyq/文档/ml-cvnets/main_train.py", line 235, in main_worker main(opts=opts, *kwargs) File "/home/hyq/anaconda3/envs/cvnets/lib/python3.10/site-packages/torch/distributed/elastic/multiprocessing/errors/init.py", line 346, in wrapper return f(args, **kwargs) File "/home/hyq/文档/ml-cvnets/main_train.py", line 174, in main training_engine.run(train_sampler=train_sampler) File "/home/hyq/文档/ml-cvnets/engine/training_engine.py", line 606, in run train_loss, train_ckpt_metric = self.train_epoch(epoch) File "/home/hyq/文档/ml-cvnets/engine/training_engine.py", line 357, in train_epoch avg_loss = train_stats.avg_statistics( File "/home/hyq/文档/ml-cvnets/metrics/stats.py", line 148, in avg_statistics logger.error( File "/home/hyq/文档/ml-cvnets/utils/logger.py", line 46, in error traceback.print_stack() 2024-04-16 18:56:22 - LOGS - Training took 00:00:02.11 2024-04-16 18:56:22 - ERROR - total_loss not present in the dictionary. Available keys are: []. Exiting!!!

train to use:cvnets-train --common.config-file /home/hyq/下载/pspnet-mobilevitv2-1.0.yaml --common.results-loc segmentation_results

pspnet-mobilevitv2-1.0.yaml: common: run_label: "run_1" accum_freq: 1 accum_after_epoch: -1 log_freq: 200 auto_resume: false mixed_precision: true grad_clip: 10.0 dataset: root_train: "/media/hyq/西部数据2TB/ml-cvnets_data/" root_val: "/media/hyq/西部数据2TB/ml-cvnets_data/" name: "ade20k1" category: "segmentation" train_batch_size0: 4 # effective batch size is 16 ( 4 * 4 GPUs) val_batch_size0: 4 eval_batch_size0: 1 workers: 4 persistent_workers: false pin_memory: false image_augmentation: random_crop: enable: true seg_class_max_ratio: 0.75 pad_if_needed: true mask_fill: 0 # background idx is 0 random_horizontal_flip: enable: true resize: enable: true size: [512, 512] interpolation: "bicubic" random_short_size_resize: enable: true interpolation: "bicubic" short_side_min: 256 short_side_max: 768 max_img_dim: 1024 photo_metric_distort: enable: true random_rotate: enable: true angle: 10 mask_fill: 0 # background idx is 0 random_gaussian_noise: enable: true sampler: name: "batch_sampler" bs: crop_size_width: 512 crop_size_height: 512 loss: category: "segmentation" ignore_idx: -1 segmentation: name: "cross_entropy" cross_entropy: aux_weight: 0.4 optim: name: "sgd" weight_decay: 1.e-4 no_decay_bn_filter_bias: true sgd: momentum: 0.9 scheduler: name: "cosine" is_iteration_based: false max_epochs: 120 cosine: max_lr: 0.02 min_lr: 0.0002 model: segmentation: name: "encoder_decoder" lr_multiplier: 1 seg_head: "pspnet" output_stride: 8 use_aux_head: true activation: name: "relu" pspnet: psp_dropout: 0.1 psp_out_channels: 512 psp_pool_sizes: [ 1, 2, 3, 6 ] classification: name: "mobilevit_v2" mitv2: width_multiplier: 1.0 attn_norm_layer: "layer_norm_2d" activation: name: "swish" normalization: name: "sync_batch_norm" momentum: 0.1 activation: name: "swish" inplace: false layer: global_pool: "mean" conv_init: "kaiming_uniform" linear_init: "normal" ema: enable: true momentum: 0.0005 stats: val: [ "loss", "iou" ] train: [ "loss", "grad_norm" ] checkpoint_metric: "iou" checkpoint_metric_max: true