error: subprocess-exited-with-error

seungyun1 commented 2 years ago

Thanks for your error report and we appreciate it a lot.

Checklist

I have searched related issues but cannot get the expected help.
The bug has not been fixed in the latest version.

Describe the bug A clear and concise description of what the bug is.

Can you give me some advice?

The error is still occurring for me.

I tried to run 'pip install -v -e.', but there is an error like below.

I don't know what the problem is. I'd appreciate it if anyone could tell me how to solve it.

I want train for using transfusion model but I don't know how to solve this problem

two gpu gpu 0 : NVIDIA TITAN X gpu 1 : NVIDIA TITAN RTX Reproduction

What command or script did you run?

`pip install -v -e .   (directory : TransFusion)`
`python tools/train.py configs/transfusion/detection/transfusion_nusc_voxel_LC.py   (directory : TransFusion)
`
A placeholder for the command.

Did you make any modifications on the code or config? Did you understand what you have modified?
What dataset did you use?

Environment

Please run python mmdet3d/utils/collect_env.py to collect necessary environment infomation and paste it here.


sys.platform: linux
Python: 3.7.12 | packaged by conda-forge | (default, Oct 26 2021, 06:08:21) [GCC 9.4.0]
CUDA available: True
GPU 0: NVIDIA TITAN RTX
GPU 1: NVIDIA TITAN X (Pascal)
CUDA_HOME: /usr/local/cuda-11.1
NVCC: Cuda compilation tools, release 11.1, V11.1.105
GCC: gcc (Ubuntu 7.5.0-3ubuntu1~18.04) 7.5.0
PyTorch: 1.10.2
PyTorch compiling details: PyTorch built with:
- GCC 7.3
- C++ Version: 201402
- Intel(R) oneAPI Math Kernel Library Version 2022.0-Product Build 20211112 for Intel(R) 64 architecture applications
- Intel(R) MKL-DNN v2.2.3 (Git Hash 7336ca9f055cf1bfa13efb658fe15dc9b41f0740)
- OpenMP 201511 (a.k.a. OpenMP 4.5)
- LAPACK is enabled (usually provided by MKL)
- NNPACK is enabled
- CPU capability usage: AVX2
- CUDA Runtime 11.1
- NVCC architecture flags: -gencode;arch=compute_37,code=sm_37;-gencode;arch=compute_50,code=sm_50;-gencode;arch=compute_60,code=sm_60;-gencode;arch=compute_61,code=sm_61;-gencode;arch=compute_70,code=sm_70;-gencode;arch=compute_75,code=sm_75;-gencode;arch=compute_80,code=sm_80;-gencode;arch=compute_86,code=sm_86;-gencode;arch=compute_37,code=compute_37
- CuDNN 8.0.5
- Magma 2.5.2
- Build settings: BLAS_INFO=mkl, BUILD_TYPE=Release, CUDA_VERSION=11.1, CUDNN_VERSION=8.0.5, CXX_COMPILER=/opt/rh/devtoolset-7/root/usr/bin/c++, CXX_FLAGS= -Wno-deprecated -fvisibility-inlines-hidden -DUSE_PTHREADPOOL -fopenmp -DNDEBUG -DUSE_KINETO -DUSE_FBGEMM -DUSE_QNNPACK -DUSE_PYTORCH_QNNPACK -DUSE_XNNPACK -DSYMBOLICATE_MOBILE_DEBUG_HANDLE -DEDGE_PROFILER_USE_KINETO -O2 -fPIC -Wno-narrowing -Wall -Wextra -Werror=return-type -Wno-missing-field-initializers -Wno-type-limits -Wno-array-bounds -Wno-unknown-pragmas -Wno-sign-compare -Wno-unused-parameter -Wno-unused-variable -Wno-unused-function -Wno-unused-result -Wno-unused-local-typedefs -Wno-strict-overflow -Wno-strict-aliasing -Wno-error=deprecated-declarations -Wno-stringop-overflow -Wno-psabi -Wno-error=pedantic -Wno-error=redundant-decls -Wno-error=old-style-cast -fdiagnostics-color=always -faligned-new -Wno-unused-but-set-variable -Wno-maybe-uninitialized -fno-math-errno -fno-trapping-math -Werror=format -Wno-stringop-overflow, LAPACK_INFO=mkl, PERF_WITH_AVX=1, PERF_WITH_AVX2=1, PERF_WITH_AVX512=1, TORCH_VERSION=1.10.2, USE_CUDA=ON, USE_CUDNN=ON, USE_EXCEPTION_PTR=1, USE_GFLAGS=OFF, USE_GLOG=OFF, USE_MKL=ON, USE_MKLDNN=ON, USE_MPI=OFF, USE_NCCL=ON, USE_NNPACK=ON, USE_OPENMP=ON,

TorchVision: 0.11.3 OpenCV: 4.5.5 MMCV: 1.5.0 MMCV Compiler: GCC 7.5 MMCV CUDA Compiler: 11.1 MMDetection: 2.24.1 MMSegmentation: 0.24.1 MMDetection3D: 1.0.0rc2+76e351a spconv2.0: False

2. You may add addition that may be helpful for locating the problem, such as
    - How you installed PyTorch [e.g., pip, conda, source]
    - Other environment variables that may be related (such as `$PATH`, `$LD_LIBRARY_PATH`, `$PYTHONPATH`, etc.)

**Error traceback**
If applicable, paste the error trackback here.

A placeholder for trackback.



**Bug fix**
If you have already identified the reason, you can provide the information here. If you are willing to create a PR to fix it, please also leave a comment here and that would be much appreciated!

![error](https://user-images.githubusercontent.com/69844293/167810221-71cd9ae8-3675-45f1-bb4c-c6d7e4fbfd3c.png)

XuyangBai commented 2 years ago

Seems to be a compilation error for scatter_points_cuda. Could you try the solutions here https://github.com/open-mmlab/mmdetection3d/issues/362?

seungyun1 commented 2 years ago

Seems to be a compilation error for scatter_points_cuda. Could you try the solutions here open-mmlab#362?

Thanks to you, I was able to solve it.

Thank you very much.

seungyun1 commented 2 years ago

Seems to be a compilation error for scatter_points_cuda. Could you try the solutions here open-mmlab#362?

Finally, there is a question.

where can I get 'fusion_voxel0075.pth' ?

there is a error OSError: checkpoints/fusion_voxel0075_R50.pth is not a checkpoint file

When I run python tools/train.py configs/transfusion_nusc_voxel_LC.py

seungyun1 commented 2 years ago

Seems to be a compilation error for scatter_points_cuda. Could you try the solutions here open-mmlab#362?

Finally, there is a question.

where can I get 'fusion_voxel0075.pth' ?

there is a error OSError: checkpoints/fusion_voxel0075_R50.pth is not a checkpoint file

When I run python tools/train.py configs/transfusion_nusc_voxel_LC.py

Just set in config file's 'load_from = None' ?

XuyangBai commented 2 years ago

Glad you solve the compile issue. For this question, I am afraid I can not share the model checkpoints due to the policy of Huawei. So you need to train both the TransFusion-L and TransFusion by yourself. Basically:

Train transfusion_nusc_voxel_L.py
Choose a 2D backbone. For nuscenes dataset, you can directly use the model provided by mmdet3d. Then you can combine the pretrained TransFusionL and 2D backbone to get the fusion_voxel0075.pth as the load_from key for TransFusion.
Train transfusion_nusc_voxel_LC.py

You can find the detail in https://github.com/XuyangBai/TransFusion/blob/master/configs/nuscenes.md

seungyun1 commented 2 years ago

Glad you solve the compile issue. For this question, I am afraid I can not share the model checkpoints due to the policy of Huawei. So you need to train both the TransFusion-L and TransFusion by yourself. Basically:

Train transfusion_nusc_voxel_L.py

Choose a 2D backbone. For nuscenes dataset, you can directly use the model provided by mmdet3d. Then you can combine the pretrained TransFusionL and 2D backbone to get the fusion_voxel0075.pth as the load_from key for TransFusion.

Train transfusion_nusc_voxel_LC.py

You can find the detail in https://github.com/XuyangBai/TransFusion/blob/master/configs/nuscenes.md

Thank you very much for your kind reply!!

wangyd-0312 commented 2 years ago

Seems to be a compilation error for scatter_points_cuda. Could you try the solutions here open-mmlab#362?

Thanks to you, I was able to solve it.

Thank you very much.

hello, I have the same error. How did you solve it? thanks a lot !

xxp912 commented 2 years ago

Seems to be a compilation error for scatter_points_cuda. Could you try the solutions here open-mmlab#362?

Thanks to you, I was able to solve it. Thank you very much.

hello, I have the same error. How did you solve it? thanks a lot ! 找到文件--找到需要修改的行--把coors_id_argsort加上{}
/TransFusion/mmdet3d/ops/voxel/src/scatter_points_cuda.cu

//coors_map.indexput(coors_id_argsort, coors_map_sorted); coors_map.indexput({coors_id_argsort}, coors_map_sorted);

ToothlessBDG commented 1 year ago

Seems to be a compilation error for scatter_points_cuda. Could you try the solutions here open-mmlab#362?

Thanks to you, I was able to solve it. Thank you very much.

hello, I have the same error. How did you solve it? thanks a lot !

Hello, I also encountered the same problem. How do I get the checkpoints file fusion_model.pth after I have completed the blood transfusion-L training? Please guide me.Thank you!

XuyangBai / TransFusion

error: subprocess-exited-with-error #16