RuntimeError: Found param backbones.ResNet50FPN.features.conv1.weight with type torch.FloatTensor, expected torch.cuda.FloatTensor.

azuryl commented 4 years ago

I run program as the guide root@bdc65ade86b1:/workspace# odtk infer retinanet_rn50fpn.pth --images /coco/val2017/ --annotations /coco/annotations/instances_val2017.json Loading model from retinanet_rn50fpn.pth... model: RetinaNet backbone: ResNet50FPN classes: 80, anchors: 9 Preparing dataset... Traceback (most recent call last): File "/opt/conda/bin/odtk", line 11, in load_entry_point('odtk', 'console_scripts', 'odtk')() File "/workspace/retinanet/retinanet/main.py", line 237, in main worker(0, args, 1, model, state) File "/workspace/retinanet/retinanet/main.py", line 186, in worker rotated_bbox=args.rotated_bbox) File "/workspace/retinanet/retinanet/infer.py", line 48, in infer world, annotations, training=False) File "/workspace/retinanet/retinanet/data.py", line 197, in init augment_saturation=augment_saturation) File "/workspace/retinanet/retinanet/data.py", line 34, in init self.coco = COCO(annotations) File "/opt/conda/lib/python3.6/site-packages/pycocotools/coco.py", line 85, in init dataset = json.load(open(annotation_file, 'r')) FileNotFoundError: [Errno 2] No such file or directory: '/coco/annotations/instances_val2017.json' root@bdc65ade86b1:/workspace# ll coco/annotations/ total 814884 drwxrwxr-x 2 1000 1000 4096 Jan 30 03:17 ./ drwxrwxr-x 4 1000 1000 4096 Jan 30 03:17 ../ -rw-rw-r-- 1 1000 1000 91865115 Sep 1 2017 captions_train2017.json -rw-rw-r-- 1 1000 1000 3872473 Sep 1 2017 captions_val2017.json -rw-rw-r-- 1 1000 1000 469785474 Sep 1 2017 instances_train2017.json -rw-rw-r-- 1 1000 1000 19987840 Sep 1 2017 instances_val2017.json -rw-rw-r-- 1 1000 1000 238884731 Sep 1 2017 person_keypoints_train2017.json -rw-rw-r-- 1 1000 1000 10020657 Sep 1 2017 person_keypoints_val2017.json root@bdc65ade86b1:/workspace# odtk infer retinanet_rn50fpn.pth --images coco/val2017/ --annotations coco/annotations/instances_val2017.json Loading model from retinanet_rn50fpn.pth... model: RetinaNet backbone: ResNet50FPN classes: 80, anchors: 9 Preparing dataset... loader: pytorch resize: 800, max: 1333 Traceback (most recent call last): File "/opt/conda/bin/odtk", line 11, in load_entry_point('odtk', 'console_scripts', 'odtk')() File "/workspace/retinanet/retinanet/main.py", line 237, in main worker(0, args, 1, model, state) File "/workspace/retinanet/retinanet/main.py", line 186, in worker rotated_bbox=args.rotated_bbox) File "/workspace/retinanet/retinanet/infer.py", line 60, in infer verbosity=0) File "/opt/conda/lib/python3.6/site-packages/apex/amp/frontend.py", line 358, in initialize return _initialize(models, optimizers, _amp_state.opt_properties, num_losses, cast_model_outputs) File "/opt/conda/lib/python3.6/site-packages/apex/amp/_initialize.py", line 171, in _initialize check_params_fp32(models) File "/opt/conda/lib/python3.6/site-packages/apex/amp/_initialize.py", line 93, in check_params_fp32 name, param.type())) File "/opt/conda/lib/python3.6/site-packages/apex/amp/_amp_state.py", line 32, in warn_or_err raise RuntimeError(msg) RuntimeError: Found param backbones.ResNet50FPN.features.conv1.weight with type torch.FloatTensor, expected torch.cuda.FloatTensor. When using amp.initialize, you need to provide a model with parameters located on a CUDA device before passing it no matter what optimization level you chose. Use model.to('cuda') to use the default device.

ghost commented 4 years ago

Looks like your PyTorch environment can't recognize CUDA devices. Which container version is this? This line (https://github.com/NVIDIA/retinanet-examples/blob/master/retinanet/infer.py#L56) should handle that.

james-nvidia commented 4 years ago

If this is an ongoing problem, please provide the exact commands you used to create and run your docker container, and then the commands to train and infer odtk.

james-nvidia commented 4 years ago

Closing, assuming that you have solved the problem by using the container. Please reopen if this hasn't solved your problem.

azuryl commented 4 years ago

I USE docker run --gpus all --rm --ipc=host -it nvcr.io/nvidia/pytorch:19.10-py3

james-nvidia commented 4 years ago

Are you still getting this problem? What's the output of nvidia-smi?

azuryl commented 4 years ago

@james-nvidia thank you reply . I have check back to pytorch:19.10 version . It seems new version can not run on docker pytorch:19.10-py3

NVIDIA / retinanet-examples

RuntimeError: Found param backbones.ResNet50FPN.features.conv1.weight with type torch.FloatTensor, expected torch.cuda.FloatTensor. #168