Getting stuck at validation step

sindhura234 commented 2 years ago

Epoch 0: 0%| | 0/10 [00:00<?, ?it/s]Trying to infer the batch_size from an ambiguous collection. The batch size we found is 1. To avoid any miscalculations, use self.log(..., batch_size=batch_size). The default behavior for interpolate/upsample with float scale_factor changed in 1.6.0 to align with other frameworks/libraries, and now uses scale_factor directly, instead of relying on the computed output size. If you wish to restore the old behavior, please set recompute_scale_factor=True. See the documentation of nn.Upsample for details.

Epoch 0: 70%|█████████ | 7/10 [00:14<00:06, 2.12s/it, loss=0.724, v_num=11] Validating: 0it [00:00, ?it/s] Validating: 0%| | 0/3 [00:00<?, ?it/s]

I have been trying for larger dataset unlike the error here. But im always getting stuck at validation stage. I tried with hippocampus dataset, the results are fine. But with my custom data, Im facing this problem. What could be the reason?

hoangtan96dl commented 2 years ago

Can you set the num_sanity_val_steps arg of Trainer class to -1 and run it again? It will run the whole validation before starting the training.

I have never seen this error Trying to infer the batch_size from an ambiguous collection. The batch size we found is 1. To avoid any miscalculations, use self.log(..., batch_size=batch_size). then maybe there is a problem with the logging step and your data. I wrote it in validation_step and validation_epoch_end of LightningModule, you can try to comment out it first.

noushinha commented 2 years ago

@cndu234 I had this error in the beginning and I solved it. Trying to recall what did I set in the trainer to get things to work. I will check and write back to you.

kingjames1155 commented 2 years ago

Do you remember how to solve this problem? I have the same problem as you. Thank you very much

noushinha commented 2 years ago

@kingjames1155 Yes, as @hoangtan96dl also mentioned you should either set num_sanity_val_steps or turn it off. you can find more about it in this page.

kingjames1155 commented 2 years ago

num_sanity_val_steps Thank you very sincerely for your help, which has played a great role According to your method, I'm not stuck Sanity Checking DataLoader，but always getting stuck at alidation DataLoader,As follows, how do you solve this problem，Thank you again for your help

Epoch 0: 0%| | 0/31 [00:00<?, ?it/s]The default behavior for interpolate/upsample with float scale_factor changed in 1.6.0 to align with other frameworks/libraries, and now uses scale_factor directly, instead of relying on the computed output size. If you wish to restore the old behavior, please set recompute_scale_factor=True. See the documentation of nn.Upsample for details. Trying to infer the batch_size from an ambiguous collection. The batch size we found is 8. To avoid any miscalculations, use self.log(..., batch_size=batch_size). Epoch 0: 26%|██████████████████████████████████▎ | 8/31 [01:43<04:57, 12.93s/it, loss=0.645, v_num=6]Trying to infer the batch_size from an ambiguous collection. The batch size we found is 3. To avoid any miscalculations, use self.log(..., batch_size=batch_size). Epoch 0: 29%|██████████████████████████████████████▌ | 9/31 [01:49<04:28, 12.20s/it, loss=0.647, v_num=6] Validation DataLoader 0: 0%| | 0/22 [00:00<?, ?it/s]

kingjames1155 commented 2 years ago

@kingjames1155 Yes, as @hoangtan96dl also mentioned you should either set num_sanity_val_steps or turn it off. you can find more about it in this page.

I found that the problem was Sliding Window Inference,, which caused me to get stuck in validation_step and predict_step，I haven't changed any parameters about Sliding Window Inference，Have you ever encountered such a situation？

noushinha commented 2 years ago

No, I didn't have such a problem. I had a problem with the number of outputs. I had 4 labels while the number of outputs was set to 2. However, it wasn't throwing a validation step error. What patch and batch size do you use?

kingjames1155 commented 2 years ago

No, I didn't have such a problem. I had a problem with the number of outputs. I had 4 labels while the number of outputs was set to 2. However, it wasn't throwing a validation step error. What patch and batch size do you use?

I used luna16 with V100,batch size is 8, train and batch is [64,64,64],I tried to lower these parameters, but it was still stuck

ZHANGJUN-OK commented 1 year ago

I had the same problem，How to avoid this problem？

VinAIResearch / 3D-UCaps

Getting stuck at validation step #4