the training process got stuck

when i try to training the model, i prepared like the README says, but i don't konw why the whole process can't keep going after the print of layer_dict

batch_size 2 <class 'int'> image_height 84 <class 'int'> image_width 84 <class 'int'> image_channels 3 <class 'int'> reset_stored_filepaths False <class 'bool'> reverse_channels False <class 'bool'> num_of_gpus 1 <class 'int'> indexes_of_folders_indicating_class [-3, -2] <class 'list'> train_val_test_split [0.64, 0.16, 0.2] <class 'list'> samples_per_iter 1 <class 'int'> labels_as_int False <class 'bool'> seed 104 <class 'int'> gpu_to_use 4 <class 'int'> num_dataprovider_workers 4 <class 'int'> max_models_to_save 5 <class 'int'> dataset_name mini_imagenet_full_size <class 'str'> dataset_path datasets/datasets/mini_imagenet_full_size dataset_path datasets/mini_imagenet_full_size <class 'str'> reset_stored_paths False <class 'bool'> experiment_name MeTAL <class 'str'> architecture_name None <class 'NoneType'> continue_from_epoch latest <class 'str'> dropout_rate_value 0.0 <class 'float'> num_target_samples 15 <class 'int'> second_order True <class 'bool'> total_epochs 100 <class 'int'> total_iter_per_epoch 500 <class 'int'> min_learning_rate 0.001 <class 'float'> meta_learning_rate 0.001 <class 'float'> meta_opt_bn False <class 'bool'> task_learning_rate 0.1 <class 'float'> norm_layer batch_norm <class 'str'> max_pooling True <class 'bool'> per_step_bn_statistics False <class 'bool'> num_classes_per_set 5 <class 'int'> cnn_num_blocks 4 <class 'int'> number_of_training_steps_per_iter 5 <class 'int'> number_of_evaluation_steps_per_iter 5 <class 'int'> cnn_num_filters 48 <class 'int'> cnn_blocks_per_stage 1 <class 'int'> num_samples_per_class 5 <class 'int'> name_of_args_json_file experiment_config/MeTAL.json <class 'str'> backbone 4-CONV <class 'str'> attenuate False <class 'bool'> alfa False <class 'bool'> random_init False <class 'bool'> meta_loss True <class 'bool'> train_seed 0 <class 'int'> val_seed 0 <class 'int'> sets_are_pre_split True <class 'bool'> evaluate_on_test_set_only False <class 'bool'> num_evaluation_tasks 600 <class 'int'> multi_step_loss_num_epochs 15 <class 'int'> minimum_per_task_contribution 0.01 <class 'float'> learnable_per_layer_per_step_inner_loop_learning_rate False <class 'bool'> enable_inner_loop_optimizable_bn_params False <class 'bool'> evalute_on_test_set_only False <class 'bool'> learnable_batch_norm_momentum False <class 'bool'> load_into_memory False <class 'bool'> init_inner_loop_learning_rate 0.01 <class 'float'> init_inner_loop_weight_decay 0.0005 <class 'float'> learnable_bn_gamma True <class 'bool'> learnable_bn_beta True <class 'bool'> total_epochs_before_pause 101 <class 'int'> first_order_to_second_order_epoch -1 <class 'int'> weight_decay 0.0 <class 'float'> num_stages 4 <class 'int'> conv_padding True <class 'bool'> use_multi_step_loss_optimization False <class 'bool'> use GPU 0 GPU ID 0 Using max pooling torch.Size([2, 48, 84, 84]) torch.Size([2, 48, 42, 42]) torch.Size([2, 48, 21, 21]) torch.Size([2, 48, 10, 10]) VGGNetwork build torch.Size([2, 5]) meta network params layer_dict.conv0.conv.weight torch.Size([48, 3, 3, 3]) layer_dict.conv0.conv.bias torch.Size([48]) layer_dict.conv0.norm_layer.running_mean torch.Size([48]) layer_dict.conv0.norm_layer.running_var torch.Size([48]) layer_dict.conv0.norm_layer.bias torch.Size([48]) layer_dict.conv0.norm_layer.weight torch.Size([48]) layer_dict.conv1.conv.weight torch.Size([48, 48, 3, 3]) layer_dict.conv1.conv.bias torch.Size([48]) layer_dict.conv1.norm_layer.running_mean torch.Size([48]) layer_dict.conv1.norm_layer.running_var torch.Size([48]) layer_dict.conv1.norm_layer.bias torch.Size([48]) layer_dict.conv1.norm_layer.weight torch.Size([48]) layer_dict.conv2.conv.weight torch.Size([48, 48, 3, 3]) layer_dict.conv2.conv.bias torch.Size([48]) layer_dict.conv2.norm_layer.running_mean torch.Size([48]) layer_dict.conv2.norm_layer.running_var torch.Size([48]) layer_dict.conv2.norm_layer.bias torch.Size([48]) layer_dict.conv2.norm_layer.weight torch.Size([48]) layer_dict.conv3.conv.weight torch.Size([48, 48, 3, 3]) layer_dict.conv3.conv.bias torch.Size([48]) layer_dict.conv3.norm_layer.running_mean torch.Size([48]) layer_dict.conv3.norm_layer.running_var torch.Size([48]) layer_dict.conv3.norm_layer.bias torch.Size([48]) layer_dict.conv3.norm_layer.weight torch.Size([48]) layer_dict.linear.weights torch.Size([5, 1200]) layer_dict.linear.bias torch.Size([5])

after show this the training process doesn't keep going but still hold this states without break down @baiksung would you mind tell me what happened and how to solve it? tks a lot

baiksung / MeTAL

the training process got stuck #5