facebookresearch / detectron2

Detectron2 is a platform for object detection, segmentation and other visual recognition tasks.
https://detectron2.readthedocs.io/en/latest/
Apache License 2.0
29.36k stars 7.33k forks source link

Traced Model is giving error on CUDA #4580

Closed darkmatter18 closed 1 year ago

darkmatter18 commented 1 year ago

If you do not know the root cause of the problem, please post according to this template:

Instructions To Reproduce the Issue:

Check https://stackoverflow.com/help/minimal-reproducible-example for how to ask good questions. Simplify the steps to reproduce the issue using suggestions from the above link, and provide them below:

  1. Full runnable code or full changes you made:

Export Command

python export.py --output ./output --export-method tracing --format torchscript --sample-image sample.jpg --run-eval

CFG

def setup_cfg(args):
    cfg = get_cfg()
    # cuda context is initialized before creating dataloader, so we don't fork anymore
    cfg.DATALOADER.NUM_WORKERS = 0
    add_pointrend_config(cfg)
    cfg.merge_from_list(args.opts)
    cfg.merge_from_file(model_zoo.get_config_file('COCO-Keypoints/keypoint_rcnn_X_101_32x8d_FPN_3x.yaml'))
    cfg.MODEL.ROI_HEADS.SCORE_THRESH_TEST = 0.5
    cfg.MODEL.DEVICE = 'cuda'
    cfg.MODEL.WEIGHTS = model_zoo.get_checkpoint_url("COCO-Keypoints/keypoint_rcnn_X_101_32x8d_FPN_3x.yaml")
    cfg.freeze()
    return cfg

Main Code for inference

device = torch.device("cuda") if torch.cuda.is_available() else torch.device("cpu")
print("Device", device)

model = torch.jit.load('./output/model.ts',map_location=device)
t = torch.as_tensor(frame.astype("float32").transpose(2, 0, 1)).to(device)
output_json = model(t)
  1. What exact command you run:
  2. Full logs or other relevant observations:
    
    Traceback (most recent call last):
    File "main.py", line 39, in <module>
    output_json = model(t)
    File "/home/arkadip_bhattacharya/agtech/codebases/detection-on-cloud/venv/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl
    return forward_call(*input, **kwargs)
    RuntimeError: The following operation failed in the TorchScript interpreter.
    Traceback of TorchScript, serialized code (most recent call last):
    File "code/__torch__/detectron2/export/flatten.py", line 26, in forward
    image_size = torch.stack([_1, _2])
    max_size, _3 = torch.max(torch.stack([image_size]), 0)
    _4 = torch.div(torch.add(max_size, CONSTANTS.c0), CONSTANTS.c1, rounding_mode="floor")
         ~~~~~~~~~ <--- HERE
    max_size0 = torch.mul(_4, CONSTANTS.c1)
    _5 = torch.sub(torch.select(max_size0, 0, -1), torch.select(image_size, 0, 1))

Traceback of TorchScript, original code (most recent call last): /home/arkadip_bhattacharya/agtech/codebases/detection-on-cloud/venv/lib/python3.8/site-packages/detectron2/structures/image_list.py(101): from_tensors /home/arkadip_bhattacharya/agtech/codebases/detection-on-cloud/venv/lib/python3.8/site-packages/detectron2/modeling/meta_arch/rcnn.py(229): preprocess_image /home/arkadip_bhattacharya/agtech/codebases/detection-on-cloud/venv/lib/python3.8/site-packages/detectron2/modeling/meta_arch/rcnn.py(203): inference export.py(123): inference /home/arkadip_bhattacharya/agtech/codebases/detection-on-cloud/venv/lib/python3.8/site-packages/detectron2/export/flatten.py(294): forward /home/arkadip_bhattacharya/agtech/codebases/detection-on-cloud/venv/lib/python3.8/site-packages/torch/nn/modules/module.py(1118): _slow_forward /home/arkadip_bhattacharya/agtech/codebases/detection-on-cloud/venv/lib/python3.8/site-packages/torch/nn/modules/module.py(1130): _call_impl /home/arkadip_bhattacharya/agtech/codebases/detection-on-cloud/venv/lib/python3.8/site-packages/torch/jit/_trace.py(967): trace_module /home/arkadip_bhattacharya/agtech/codebases/detection-on-cloud/venv/lib/python3.8/site-packages/torch/jit/_trace.py(750): trace export.py(132): export_tracing export.py(232): RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cpu!


## Expected behavior:

If there are no obvious crash in "full logs" provided above,
please tell us the expected behavior.

If you expect a model to converge / work better, we do not help with such issues, unless
a model fails to reproduce the results in detectron2 model zoo, or proves existence of bugs.

## Environment:

Paste the output of the following command:

sys.platform linux Python 3.8.10 (default, Jun 22 2022, 20:18:18) [GCC 9.4.0] numpy 1.17.4 detectron2 0.6 @/home/arkadip_bhattacharya/.local/lib/python3.8/site-packages/detectron2 Compiler GCC 9.4 CUDA compiler CUDA 11.3 detectron2 arch flags 6.1 DETECTRON2_ENV_MODULE PyTorch 1.12.0+cu113 @/home/arkadip_bhattacharya/.local/lib/python3.8/site-packages/torch PyTorch debug build False GPU available Yes GPU 0 NVIDIA GeForce MX230 (arch=6.1) Driver version 515.65.01 CUDA_HOME /usr/local/cuda Pillow 9.2.0 torchvision 0.13.0+cu113 @/home/arkadip_bhattacharya/.local/lib/python3.8/site-packages/torchvision torchvision arch flags 3.5, 5.0, 6.0, 7.0, 7.5, 8.0, 8.6 fvcore 0.1.5.post20220512 iopath 0.1.9 cv2 4.2.0


PyTorch built with:



If your issue looks like an installation issue / environment issue,
please first check common issues in https://detectron2.readthedocs.io/tutorials/install.html#common-installation-issues
ppwwyyxx commented 1 year ago

Model should be exported on cuda to be used on cuda. See the first example in https://github.com/facebookresearch/detectron2/tree/main/tools/deploy#use

duklin commented 1 year ago
./export_model.py --config-file ../../configs/DLA_mask_rcnn_X_101_32x8d_FPN_3x.yaml --output ./output --export-method tracing --format torchscript MODEL.WEIGHTS ../../../DLA_mask_rcnn_X_101_32x8d_FPN_3x.pth MODEL.DEVICE cuda

Even though I specify MODEL.DEVICE cuda I still get the same error.

I am noticing (device=cpu) on a couple places in the txt files in the output folder. I am not sure if this is an indicator that the model was not exported on cuda even though it was specified, and I noticed that there was a process occupying portion of the GPU memory while the script was running (nvidia-smi).

Is it possible that the sample_image should also be transfered to GPU memory too?