After CUDA runs out of memory, the loss calculation and summary will fail due to a division by zero error. It seems that the batch skipping functionality is not entirely working...?
(Screenshot is taken on batch size 16.)
Conda and pip envs are as follows.
pip_env.txtconda_env.txt
While using a smaller batch size is a valid workaround, any help will be appreciated here.
Edit: It seems that at batch size 4 the skip works properly, but messes with the validation possibly due to skipping 1-sized batches.
After CUDA runs out of memory, the loss calculation and summary will fail due to a division by zero error. It seems that the batch skipping functionality is not entirely working...? (Screenshot is taken on batch size 16.)
Conda and pip envs are as follows.
pip_env.txt
conda_env.txt
While using a smaller batch size is a valid workaround, any help will be appreciated here.
Edit: It seems that at batch size 4 the skip works properly, but messes with the validation possibly due to skipping 1-sized batches.