Traced Model is giving error on CUDA

darkmatter18 commented 1 year ago

If you do not know the root cause of the problem, please post according to this template:

Instructions To Reproduce the Issue:

Check https://stackoverflow.com/help/minimal-reproducible-example for how to ask good questions. Simplify the steps to reproduce the issue using suggestions from the above link, and provide them below:

Full runnable code or full changes you made:

Export Command

python export.py --output ./output --export-method tracing --format torchscript --sample-image sample.jpg --run-eval

CFG

def setup_cfg(args):
    cfg = get_cfg()
    # cuda context is initialized before creating dataloader, so we don't fork anymore
    cfg.DATALOADER.NUM_WORKERS = 0
    add_pointrend_config(cfg)
    cfg.merge_from_list(args.opts)
    cfg.merge_from_file(model_zoo.get_config_file('COCO-Keypoints/keypoint_rcnn_X_101_32x8d_FPN_3x.yaml'))
    cfg.MODEL.ROI_HEADS.SCORE_THRESH_TEST = 0.5
    cfg.MODEL.DEVICE = 'cuda'
    cfg.MODEL.WEIGHTS = model_zoo.get_checkpoint_url("COCO-Keypoints/keypoint_rcnn_X_101_32x8d_FPN_3x.yaml")
    cfg.freeze()
    return cfg

Main Code for inference

device = torch.device("cuda") if torch.cuda.is_available() else torch.device("cpu")
print("Device", device)

model = torch.jit.load('./output/model.ts',map_location=device)
t = torch.as_tensor(frame.astype("float32").transpose(2, 0, 1)).to(device)
output_json = model(t)

What exact command you run:

Full logs or other relevant observations:


Traceback (most recent call last):
File "main.py", line 39, in <module>
output_json = model(t)
File "/home/arkadip_bhattacharya/agtech/codebases/detection-on-cloud/venv/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl
return forward_call(*input, **kwargs)
RuntimeError: The following operation failed in the TorchScript interpreter.
Traceback of TorchScript, serialized code (most recent call last):
File "code/__torch__/detectron2/export/flatten.py", line 26, in forward
image_size = torch.stack([_1, _2])
max_size, _3 = torch.max(torch.stack([image_size]), 0)
_4 = torch.div(torch.add(max_size, CONSTANTS.c0), CONSTANTS.c1, rounding_mode="floor")
     ~~~~~~~~~ <--- HERE
max_size0 = torch.mul(_4, CONSTANTS.c1)
_5 = torch.sub(torch.select(max_size0, 0, -1), torch.select(image_size, 0, 1))

Traceback of TorchScript, original code (most recent call last): /home/arkadip_bhattacharya/agtech/codebases/detection-on-cloud/venv/lib/python3.8/site-packages/detectron2/structures/image_list.py(101): from_tensors /home/arkadip_bhattacharya/agtech/codebases/detection-on-cloud/venv/lib/python3.8/site-packages/detectron2/modeling/meta_arch/rcnn.py(229): preprocess_image /home/arkadip_bhattacharya/agtech/codebases/detection-on-cloud/venv/lib/python3.8/site-packages/detectron2/modeling/meta_arch/rcnn.py(203): inference export.py(123): inference /home/arkadip_bhattacharya/agtech/codebases/detection-on-cloud/venv/lib/python3.8/site-packages/detectron2/export/flatten.py(294): forward /home/arkadip_bhattacharya/agtech/codebases/detection-on-cloud/venv/lib/python3.8/site-packages/torch/nn/modules/module.py(1118): _slow_forward /home/arkadip_bhattacharya/agtech/codebases/detection-on-cloud/venv/lib/python3.8/site-packages/torch/nn/modules/module.py(1130): _call_impl /home/arkadip_bhattacharya/agtech/codebases/detection-on-cloud/venv/lib/python3.8/site-packages/torch/jit/_trace.py(967): trace_module /home/arkadip_bhattacharya/agtech/codebases/detection-on-cloud/venv/lib/python3.8/site-packages/torch/jit/_trace.py(750): trace export.py(132): export_tracing export.py(232): RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cpu!


## Expected behavior:

If there are no obvious crash in "full logs" provided above,
please tell us the expected behavior.

If you expect a model to converge / work better, we do not help with such issues, unless
a model fails to reproduce the results in detectron2 model zoo, or proves existence of bugs.

## Environment:

Paste the output of the following command:

sys.platform linux Python 3.8.10 (default, Jun 22 2022, 20:18:18) [GCC 9.4.0] numpy 1.17.4 detectron2 0.6 @/home/arkadip_bhattacharya/.local/lib/python3.8/site-packages/detectron2 Compiler GCC 9.4 CUDA compiler CUDA 11.3 detectron2 arch flags 6.1 DETECTRON2_ENV_MODULE PyTorch 1.12.0+cu113 @/home/arkadip_bhattacharya/.local/lib/python3.8/site-packages/torch PyTorch debug build False GPU available Yes GPU 0 NVIDIA GeForce MX230 (arch=6.1) Driver version 515.65.01 CUDA_HOME /usr/local/cuda Pillow 9.2.0 torchvision 0.13.0+cu113 @/home/arkadip_bhattacharya/.local/lib/python3.8/site-packages/torchvision torchvision arch flags 3.5, 5.0, 6.0, 7.0, 7.5, 8.0, 8.6 fvcore 0.1.5.post20220512 iopath 0.1.9 cv2 4.2.0

PyTorch built with:

GCC 9.3
C++ Version: 201402
Intel(R) Math Kernel Library Version 2020.0.0 Product Build 20191122 for Intel(R) 64 architecture applications
Intel(R) MKL-DNN v2.6.0 (Git Hash 52b5f107dd9cf10910aaa19cb47f3abf9b349815)
OpenMP 201511 (a.k.a. OpenMP 4.5)
LAPACK is enabled (usually provided by MKL)
NNPACK is enabled
CPU capability usage: AVX2
CUDA Runtime 11.3
NVCC architecture flags: -gencode;arch=compute_37,code=sm_37;-gencode;arch=compute_50,code=sm_50;-gencode;arch=compute_60,code=sm_60;-gencode;arch=compute_70,code=sm_70;-gencode;arch=compute_75,code=sm_75;-gencode;arch=compute_80,code=sm_80;-gencode;arch=compute_86,code=sm_86
CuDNN 8.3.2 (built against CUDA 11.5)
Magma 2.5.2
Build settings: BLAS_INFO=mkl, BUILD_TYPE=Release, CUDA_VERSION=11.3, CUDNN_VERSION=8.3.2, CXX_COMPILER=/opt/rh/devtoolset-9/root/usr/bin/c++, CXX_FLAGS= -Wno-deprecated -fvisibility-inlines-hidden -DUSE_PTHREADPOOL -fopenmp -DNDEBUG -DUSE_KINETO -DUSE_FBGEMM -DUSE_QNNPACK -DUSE_PYTORCH_QNNPACK -DUSE_XNNPACK -DSYMBOLICATE_MOBILE_DEBUG_HANDLE -DEDGE_PROFILER_USE_KINETO -O2 -fPIC -Wno-narrowing -Wall -Wextra -Werror=return-type -Wno-missing-field-initializers -Wno-type-limits -Wno-array-bounds -Wno-unknown-pragmas -Wno-unused-parameter -Wno-unused-function -Wno-unused-result -Wno-unused-local-typedefs -Wno-strict-overflow -Wno-strict-aliasing -Wno-error=deprecated-declarations -Wno-stringop-overflow -Wno-psabi -Wno-error=pedantic -Wno-error=redundant-decls -Wno-error=old-style-cast -fdiagnostics-color=always -faligned-new -Wno-unused-but-set-variable -Wno-maybe-uninitialized -fno-math-errno -fno-trapping-math -Werror=format -Werror=cast-function-type -Wno-stringop-overflow, LAPACK_INFO=mkl, PERF_WITH_AVX=1, PERF_WITH_AVX2=1, PERF_WITH_AVX512=1, TORCH_VERSION=1.12.0, USE_CUDA=ON, USE_CUDNN=ON, USE_EXCEPTION_PTR=1, USE_GFLAGS=OFF, USE_GLOG=OFF, USE_MKL=ON, USE_MKLDNN=OFF, USE_MPI=OFF, USE_NCCL=ON, USE_NNPACK=ON, USE_OPENMP=ON, USE_ROCM=OFF,



If your issue looks like an installation issue / environment issue,
please first check common issues in https://detectron2.readthedocs.io/tutorials/install.html#common-installation-issues

ppwwyyxx commented 1 year ago

Model should be exported on cuda to be used on cuda. See the first example in https://github.com/facebookresearch/detectron2/tree/main/tools/deploy#use

duklin commented 1 year ago

./export_model.py --config-file ../../configs/DLA_mask_rcnn_X_101_32x8d_FPN_3x.yaml --output ./output --export-method tracing --format torchscript MODEL.WEIGHTS ../../../DLA_mask_rcnn_X_101_32x8d_FPN_3x.pth MODEL.DEVICE cuda

Even though I specify MODEL.DEVICE cuda I still get the same error.

I am noticing (device=cpu) on a couple places in the txt files in the output folder. I am not sure if this is an indicator that the model was not exported on cuda even though it was specified, and I noticed that there was a process occupying portion of the GPU memory while the script was running (nvidia-smi).

Is it possible that the sample_image should also be transfered to GPU memory too?

facebookresearch / detectron2

Traced Model is giving error on CUDA #4580

Instructions To Reproduce the Issue: