Closed azuryl closed 4 years ago
Looks like your PyTorch environment can't recognize CUDA devices. Which container version is this? This line (https://github.com/NVIDIA/retinanet-examples/blob/master/retinanet/infer.py#L56) should handle that.
If this is an ongoing problem, please provide the exact commands you used to create and run your docker container, and then the commands to train and infer odtk
.
Closing, assuming that you have solved the problem by using the container. Please reopen if this hasn't solved your problem.
I USE docker run --gpus all --rm --ipc=host -it nvcr.io/nvidia/pytorch:19.10-py3
Are you still getting this problem?
What's the output of nvidia-smi
?
@james-nvidia thank you reply . I have check back to pytorch:19.10 version . It seems new version can not run on docker pytorch:19.10-py3
I run program as the guide root@bdc65ade86b1:/workspace# odtk infer retinanet_rn50fpn.pth --images /coco/val2017/ --annotations /coco/annotations/instances_val2017.json Loading model from retinanet_rn50fpn.pth... model: RetinaNet backbone: ResNet50FPN classes: 80, anchors: 9 Preparing dataset... Traceback (most recent call last): File "/opt/conda/bin/odtk", line 11, in
load_entry_point('odtk', 'console_scripts', 'odtk')()
File "/workspace/retinanet/retinanet/main.py", line 237, in main
worker(0, args, 1, model, state)
File "/workspace/retinanet/retinanet/main.py", line 186, in worker
rotated_bbox=args.rotated_bbox)
File "/workspace/retinanet/retinanet/infer.py", line 48, in infer
world, annotations, training=False)
File "/workspace/retinanet/retinanet/data.py", line 197, in init
augment_saturation=augment_saturation)
File "/workspace/retinanet/retinanet/data.py", line 34, in init
self.coco = COCO(annotations)
File "/opt/conda/lib/python3.6/site-packages/pycocotools/coco.py", line 85, in init
dataset = json.load(open(annotation_file, 'r'))
FileNotFoundError: [Errno 2] No such file or directory: '/coco/annotations/instances_val2017.json'
root@bdc65ade86b1:/workspace# ll coco/annotations/
total 814884
drwxrwxr-x 2 1000 1000 4096 Jan 30 03:17 ./
drwxrwxr-x 4 1000 1000 4096 Jan 30 03:17 ../
-rw-rw-r-- 1 1000 1000 91865115 Sep 1 2017 captions_train2017.json
-rw-rw-r-- 1 1000 1000 3872473 Sep 1 2017 captions_val2017.json
-rw-rw-r-- 1 1000 1000 469785474 Sep 1 2017 instances_train2017.json
-rw-rw-r-- 1 1000 1000 19987840 Sep 1 2017 instances_val2017.json
-rw-rw-r-- 1 1000 1000 238884731 Sep 1 2017 person_keypoints_train2017.json
-rw-rw-r-- 1 1000 1000 10020657 Sep 1 2017 person_keypoints_val2017.json
root@bdc65ade86b1:/workspace# odtk infer retinanet_rn50fpn.pth --images coco/val2017/ --annotations coco/annotations/instances_val2017.json
Loading model from retinanet_rn50fpn.pth...
model: RetinaNet
backbone: ResNet50FPN
classes: 80, anchors: 9
Preparing dataset...
loader: pytorch
resize: 800, max: 1333
Traceback (most recent call last):
File "/opt/conda/bin/odtk", line 11, in
load_entry_point('odtk', 'console_scripts', 'odtk')()
File "/workspace/retinanet/retinanet/main.py", line 237, in main
worker(0, args, 1, model, state)
File "/workspace/retinanet/retinanet/main.py", line 186, in worker
rotated_bbox=args.rotated_bbox)
File "/workspace/retinanet/retinanet/infer.py", line 60, in infer
verbosity=0)
File "/opt/conda/lib/python3.6/site-packages/apex/amp/frontend.py", line 358, in initialize
return _initialize(models, optimizers, _amp_state.opt_properties, num_losses, cast_model_outputs)
File "/opt/conda/lib/python3.6/site-packages/apex/amp/_initialize.py", line 171, in _initialize
check_params_fp32(models)
File "/opt/conda/lib/python3.6/site-packages/apex/amp/_initialize.py", line 93, in check_params_fp32
name, param.type()))
File "/opt/conda/lib/python3.6/site-packages/apex/amp/_amp_state.py", line 32, in warn_or_err
raise RuntimeError(msg)
RuntimeError: Found param backbones.ResNet50FPN.features.conv1.weight with type torch.FloatTensor, expected torch.cuda.FloatTensor.
When using amp.initialize, you need to provide a model with parameters
located on a CUDA device before passing it no matter what optimization level
you chose. Use model.to('cuda') to use the default device.