Closed YunhuaZhang closed 4 years ago
It appears to be running appropriately for many iterations. I would check that the dataloader is not shuffling the test dataset. If it is not shuffling the test dataset, I would also check that the particular video in the test set is not corrupted on download. Rewrite your script to print out which video it is working on at inference time and check that video is OK.
A working inference time model is also provided at script/InitializationNotebook.ipynb and can be compared as well.
Thank you. Finally, I set num_workers=0 then the error disappears.
But, in the code, you set blocks=100 for testing. However, this leads to the CUDA out of memory problem. How to set the blocks?
The blocks variable is a parameter to can set to optimize for your hardware set up. You can make it smaller to not overflow your gpu memory.
Hi,
When I run the trained model on test data, I always get this error:
41%|███████▍ | 525/1276 [37:39<43:27, 3.47s/it, 29.33 (4.19) / 27.63]tensor([[62.2167]], device='cuda:0') 41%|███████▍ | 526/1276 [37:41<40:23, 3.23s/it, 29.30 (2.20) / 27.65]tensor([[64.5487]], device='cuda:0') 41%|███████ | 527/1276 [37:45<40:46, 3.27s/it, 29.27 (13.39) / 27.67]tensor([[58.4618]], device='cuda:0') 41%|██████▌ | 528/1276 [37:47<38:45, 3.11s/it, 29.43 (157.65) / 27.72]tensor([[61.3680]], device='cuda:0') 41%|███████▍ | 529/1276 [37:51<39:09, 3.15s/it, 29.40 (8.24) / 27.73]tensor([[61.9610]], device='cuda:0') 42%|███████▍ | 530/1276 [37:57<51:21, 4.13s/it, 29.33 (4.85) / 27.71]tensor([[64.5215]], device='cuda:0') 42%|███████▍ | 531/1276 [38:03<57:19, 4.62s/it, 29.26 (2.91) / 27.70]tensor([[49.3714]], device='cuda:0') 42%|███████ | 532/1276 [38:07<56:39, 4.57s/it, 29.26 (28.22) / 27.68]tensor([[48.1342]], device='cuda:0') 42%|███████ | 533/1276 [38:10<50:35, 4.09s/it, 29.28 (47.93) / 27.67]tensor([[29.6015]], device='cuda:0') 42%|██████▎ | 534/1276 [38:20<1:12:10, 5.84s/it, 29.37 (72.82) / 27.62]tensor([[66.5562]], device='cuda:0') 42%|██████▎ | 535/1276 [38:26<1:11:19, 5.78s/it, 29.34 (10.90) / 27.66]tensor([[20.8735]], device='cuda:0') 42%|███████▌ | 536/1276 [38:27<53:51, 4.37s/it, 29.33 (9.51) / 27.66]tensor([[40.1236]], device='cuda:0') 42%|███████▏ | 537/1276 [38:30<50:27, 4.10s/it, 29.33 (27.54) / 27.65] Traceback (most recent call last): File "run_ef.py", line 3, in
echonet.utils.video.run(modelname="r2plus1d_18",frames=32, period=2,pretrained=True,batch_size=8)
File "/home/yzhang8/dynamic/echonet/utils/video.py", line 184, in run
blocks=2)
File "/home/yzhang8/dynamic/echonet/utils/video.py", line 266, in run_epoch
tmp = model(X[j:(j + blocks), ...])
File "/home/yzhang8/anaconda3/lib/python3.7/site-packages/torch/nn/modules/module.py", line 541, in call
result = self.forward(*input, kwargs)
File "/home/yzhang8/anaconda3/lib/python3.7/site-packages/torch/nn/parallel/data_parallel.py", line 150, in forward
return self.module(*inputs[0], *kwargs[0])
File "/home/yzhang8/anaconda3/lib/python3.7/site-packages/torch/nn/modules/module.py", line 541, in call
result = self.forward(input, kwargs)
File "/home/yzhang8/anaconda3/lib/python3.7/site-packages/torchvision/models/video/resnet.py", line 233, in forward
x = self.layer4(x)
File "/home/yzhang8/anaconda3/lib/python3.7/site-packages/torch/nn/modules/module.py", line 541, in call
result = self.forward(*input, kwargs)
File "/home/yzhang8/anaconda3/lib/python3.7/site-packages/torch/nn/modules/container.py", line 92, in forward
input = module(input)
File "/home/yzhang8/anaconda3/lib/python3.7/site-packages/torch/nn/modules/module.py", line 541, in call
result = self.forward(*input, *kwargs)
File "/home/yzhang8/anaconda3/lib/python3.7/site-packages/torchvision/models/video/resnet.py", line 107, in forward
out = self.conv2(out)
File "/home/yzhang8/anaconda3/lib/python3.7/site-packages/torch/nn/modules/module.py", line 541, in call
result = self.forward(input, kwargs)
File "/home/yzhang8/anaconda3/lib/python3.7/site-packages/torch/nn/modules/container.py", line 92, in forward
input = module(input)
File "/home/yzhang8/anaconda3/lib/python3.7/site-packages/torch/nn/modules/module.py", line 541, in call
result = self.forward(*input, **kwargs)
File "/home/yzhang8/anaconda3/lib/python3.7/site-packages/torch/nn/modules/batchnorm.py", line 81, in forward
exponential_average_factor, self.eps)
File "/home/yzhang8/anaconda3/lib/python3.7/site-packages/torch/nn/functional.py", line 1670, in batch_norm
training, momentum, eps, torch.backends.cudnn.enabled
File "/home/yzhang8/anaconda3/lib/python3.7/site-packages/torch/utils/data/_utils/signal_handling.py", line 66, in handler
_error_if_any_worker_fails()
RuntimeError: DataLoader worker (pid 37531) is killed by signal: Killed.