Haiyang-W / DSVT

[CVPR2023] Official Implementation of "DSVT: Dynamic Sparse Voxel Transformer with Rotated Sets"
https://arxiv.org/abs/2301.06051
Apache License 2.0
373 stars 28 forks source link

Error when python setup.py develop #16

Closed sky-fly97 closed 1 year ago

sky-fly97 commented 1 year ago

Hi, my pytorch version used to be 1.8.1, then I was able to run python setup.py develop successfully. But the required version is greater than 1.9. so I created a new environment, pip install torch==1.10.0+cu111 torchvision==0.11.0+cu111 torchaudio==0.10.0 -f https://download.pytorch.org/whl/torch_stable.html, and had problems rompiling, it looks like it's in pcdet/ops/ingroup_inds

[2/2] /nvme/yanxiangchao/perl5/drivers/cuda-11.1/bin/nvcc -I/nvme/yanxiangchao/anaconda3/envs/test/lib/python3.9/site-packages/torch/include -I/nvme/yanxiangchao/anaconda3/envs/test/lib/python3.9/site-packages/torch/include/torch/csrc/api/include -I/nvme/yanxiangchao/anacon
da3/envs/test/lib/python3.9/site-packages/torch/include/TH -I/nvme/yanxiangchao/anaconda3/envs/test/lib/python3.9/site-packages/torch/include/THC -I/nvme/yanxiangchao/perl5/drivers/cuda-11.1/include -I/nvme/yanxiangchao/anaconda3/envs/test/include/python3.9 -c -c /nvme/yanxi
angchao/perl5/pretrain/openmdf_dsvt/openmdf/pcdet/ops/ingroup_inds/src/ingroup_inds_kernel.cu -o /nvme/yanxiangchao/perl5/pretrain/openmdf_dsvt/openmdf/build/temp.linux-x86_64-cpython-39/pcdet/ops/ingroup_inds/src/ingroup_inds_kernel.o -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_N
O_HALF_CONVERSIONS__ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr --compiler-options ''"'"'-fPIC'"'"'' -DTORCH_API_INCLUDE_EXTENSION_H '-DPYBIND11_COMPILER_TYPE="gcc"' '-DPYBIND11_STDLIB="libstdcpp"' '-DPYBIND11_BUILD_ABI="cxxa
bi1011"' -DTORCH_EXTENSION_NAME=ingroup_inds_cuda -D_GLIBCXX_USE_CXX11_ABI=0 -gencode=arch=compute_80,code=compute_80 -gencode=arch=compute_80,code=sm_80 -std=c++14
FAILED: /nvme/yanxiangchao/perl5/pretrain/openmdf_dsvt/openmdf/build/temp.linux-x86_64-cpython-39/pcdet/ops/ingroup_inds/src/ingroup_inds_kernel.o
/nvme/yanxiangchao/perl5/drivers/cuda-11.1/bin/nvcc -I/nvme/yanxiangchao/anaconda3/envs/test/lib/python3.9/site-packages/torch/include -I/nvme/yanxiangchao/anaconda3/envs/test/lib/python3.9/site-packages/torch/include/torch/csrc/api/include -I/nvme/yanxiangchao/anaconda3/en
vs/test/lib/python3.9/site-packages/torch/include/TH -I/nvme/yanxiangchao/anaconda3/envs/test/lib/python3.9/site-packages/torch/include/THC -I/nvme/yanxiangchao/perl5/drivers/cuda-11.1/include -I/nvme/yanxiangchao/anaconda3/envs/test/include/python3.9 -c -c /nvme/yanxiangcha
o/perl5/pretrain/openmdf_dsvt/openmdf/pcdet/ops/ingroup_inds/src/ingroup_inds_kernel.cu -o /nvme/yanxiangchao/perl5/pretrain/openmdf_dsvt/openmdf/build/temp.linux-x86_64-cpython-39/pcdet/ops/ingroup_inds/src/ingroup_inds_kernel.o -D__CUDA_NO_HALF_OPERATORS -D__CUDA_NO_HALF
CONVERSIONS -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr --compiler-options ''"'"'-fPIC'"'"'' -DTORCH_API_INCLUDE_EXTENSION_H '-DPYBIND11_COMPILER_TYPE="_gcc"' '-DPYBIND11_STDLIB="_libstdcpp"' '-DPYBIND11_BUILD_ABI="_cxxabi1011
"' -DTORCH_EXTENSION_NAME=ingroup_inds_cuda -D_GLIBCXX_USE_CXX11_ABI=0 -gencode=arch=compute_80,code=compute_80 -gencode=arch=compute_80,code=sm_80 -std=c++14
/nvme/yanxiangchao/anaconda3/envs/test/lib/python3.9/site-packages/torch/include/torch/csrc/api/include/torch/nn/cloneable.h: In instantiation of ‘std::shared_ptrtorch::nn::Module torch::nn::Cloneable::clone(const c10::optionalc10::Device&) const [with Derived =
torch::nn::CrossMapLRN2dImpl]’:
/tmp/tmpxft_0000854a_00000000-6_ingroup_inds_kernel.cudafe1.stub.c:4:27: required from here
/nvme/yanxiangchao/anaconda3/envs/test/lib/python3.9/site-packages/torch/include/torch/csrc/api/include/torch/nn/cloneable.h:57:59: error: invalid static_cast from type ‘const torch::OrderedDict<std::basic_string, at::Tensor>’ to type ‘torch::OrderedDict<std::basic_str
ing, at::Tensor>&’
/nvme/yanxiangchao/anaconda3/envs/test/lib/python3.9/site-packages/torch/include/torch/csrc/api/include/torch/nn/cloneable.h:69:61: error: invalid static_cast from type ‘const torch::OrderedDict<std::basic_string, std::shared_ptrtorch::nn::Module >’ to type ‘torch::O
rderedDict<std::basic_string, std::shared_ptrtorch::nn::Module >&’
/nvme/yanxiangchao/anaconda3/envs/test/lib/python3.9/site-packages/torch/include/torch/csrc/api/include/torch/nn/cloneable.h: In instantiation of ‘std::shared_ptrtorch::nn::Module torch::nn::Cloneable::clone(const c10::optionalc10::Device&) const [with Derived =
torch::nn::EmbeddingBagImpl]’:
/tmp/tmpxft_0000854a_00000000-6_ingroup_inds_kernel.cudafe1.stub.c:4:27: required from here
/nvme/yanxiangchao/anaconda3/envs/test/lib/python3.9/site-packages/torch/include/torch/csrc/api/include/torch/nn/cloneable.h:57:59: error: invalid static_cast from type ‘const torch::OrderedDict<std::basic_string, at::Tensor>’ to type ‘torch::OrderedDict<std::basic_str
ing, at::Tensor>&’
/nvme/yanxiangchao/anaconda3/envs/test/lib/python3.9/site-packages/torch/include/torch/csrc/api/include/torch/nn/cloneable.h:69:61: error: invalid static_cast from type ‘const torch::OrderedDict<std::basic_string, std::shared_ptrtorch::nn::Module >’ to type ‘torch::O
rderedDict<std::basic_string, std::shared_ptrtorch::nn::Module >&’
/nvme/yanxiangchao/anaconda3/envs/test/lib/python3.9/site-packages/torch/include/torch/csrc/api/include/torch/nn/cloneable.h: In instantiation of ‘std::shared_ptrtorch::nn::Module torch::nn::Cloneable::clone(const c10::optionalc10::Device&) const [with Derived =
torch::nn::EmbeddingImpl]’:
/tmp/tmpxft_0000854a_00000000-6_ingroup_inds_kernel.cudafe1.stub.c:4:27: required from here
/nvme/yanxiangchao/anaconda3/envs/test/lib/python3.9/site-packages/torch/include/torch/csrc/api/include/torch/nn/cloneable.h:57:59: error: invalid static_cast from type ‘const torch::OrderedDict<std::basic_string, at::Tensor>’ to type ‘torch::OrderedDict<std::basic_str
ing, at::Tensor>&’
/nvme/yanxiangchao/anaconda3/envs/test/lib/python3.9/site-packages/torch/include/torch/csrc/api/include/torch/nn/cloneable.h:69:61: error: invalid static_cast from type ‘const torch::OrderedDict<std::basic_string, std::shared_ptrtorch::nn::Module >’ to type ‘torch::O
rderedDict<std::basic_string, std::shared_ptrtorch::nn::Module >&’
/nvme/yanxiangchao/anaconda3/envs/test/lib/python3.9/site-packages/torch/include/torch/csrc/api/include/torch/nn/cloneable.h: In instantiation of ‘std::shared_ptrtorch::nn::Module torch::nn::Cloneable::clone(const c10::optionalc10::Device&) const [with Derived =
torch::nn::ParameterDictImpl]’:
/tmp/tmpxft_0000854a_00000000-6_ingroup_inds_kernel.cudafe1.stub.c:4:27: required from here
/nvme/yanxiangchao/anaconda3/envs/test/lib/python3.9/site-packages/torch/include/torch/csrc/api/include/torch/nn/cloneable.h:57:59: error: invalid static_cast from type ‘const torch::OrderedDict<std::basic_string, at::Tensor>’ to type ‘torch::OrderedDict<std::basic_str
ing, at::Tensor>&’
/nvme/yanxiangchao/anaconda3/envs/test/lib/python3.9/site-packages/torch/include/torch/csrc/api/include/torch/nn/cloneable.h:69:61: error: invalid static_cast from type ‘const torch::OrderedDict<std::basic_string, std::shared_ptrtorch::nn::Module >’ to type ‘torch::O
rderedDict<std::basic_string, std::shared_ptrtorch::nn::Module >&’
/nvme/yanxiangchao/anaconda3/envs/test/lib/python3.9/site-packages/torch/include/torch/csrc/api/include/torch/nn/cloneable.h: In instantiation of ‘std::shared_ptrtorch::nn::Module torch::nn::Cloneable::clone(const c10::optionalc10::Device&) const [with Derived =
torch::nn::SequentialImpl]’:
/tmp/tmpxft_0000854a_00000000-6_ingroup_inds_kernel.cudafe1.stub.c:4:27: required from here
/nvme/yanxiangchao/anaconda3/envs/test/lib/python3.9/site-packages/torch/include/torch/csrc/api/include/torch/nn/cloneable.h:57:59: error: invalid static_cast from type ‘const torch::OrderedDict<std::basic_string, at::Tensor>’ to type ‘torch::OrderedDict<std::basic_str
ing, at::Tensor>&’
/nvme/yanxiangchao/anaconda3/envs/test/lib/python3.9/site-packages/torch/include/torch/csrc/api/include/torch/nn/cloneable.h:69:61: error: invalid static_cast from type ‘const torch::OrderedDict<std::basic_string, std::shared_ptrtorch::nn::Module >’ to type ‘torch::O
rderedDict<std::basic_string, std::shared_ptrtorch::nn::Module >&’
/nvme/yanxiangchao/anaconda3/envs/test/lib/python3.9/site-packages/torch/include/torch/csrc/api/include/torch/nn/cloneable.h: In instantiation of ‘std::shared_ptrtorch::nn::Module torch::nn::Cloneable::clone(const c10::optionalc10::Device&) const [with Derived =
torch::nn::ModuleListImpl]’:
/tmp/tmpxft_0000854a_00000000-6_ingroup_inds_kernel.cudafe1.stub.c:4:27: required from here
/nvme/yanxiangchao/anaconda3/envs/test/lib/python3.9/site-packages/torch/include/torch/csrc/api/include/torch/nn/cloneable.h:57:59: error: invalid static_cast from type ‘const torch::OrderedDict<std::basic_string, at::Tensor>’ to type ‘torch::OrderedDict<std::basic_str
ing, at::Tensor>&’
/nvme/yanxiangchao/anaconda3/envs/test/lib/python3.9/site-packages/torch/include/torch/csrc/api/include/torch/nn/cloneable.h:69:61: error: invalid static_cast from type ‘const torch::OrderedDict<std::basic_string, std::shared_ptrtorch::nn::Module >’ to type ‘torch::O
rderedDict<std::basic_string, std::shared_ptrtorch::nn::Module >&’
/nvme/yanxiangchao/anaconda3/envs/test/lib/python3.9/site-packages/torch/include/torch/csrc/api/include/torch/nn/cloneable.h: In instantiation of ‘std::shared_ptrtorch::nn::Module torch::nn::Cloneable::clone(const c10::optionalc10::Device&) const [with Derived =
torch::nn::ModuleDictImpl]’:
/tmp/tmpxft_0000854a_00000000-6_ingroup_inds_kernel.cudafe1.stub.c:4:27: required from here
/nvme/yanxiangchao/anaconda3/envs/test/lib/python3.9/site-packages/torch/include/torch/csrc/api/include/torch/nn/cloneable.h:57:59: error: invalid static_cast from type ‘const torch::OrderedDict<std::basic_string, at::Tensor>’ to type ‘torch::OrderedDict<std::basic_str
ing, at::Tensor>&’
/nvme/yanxiangchao/anaconda3/envs/test/lib/python3.9/site-packages/torch/include/torch/csrc/api/include/torch/nn/cloneable.h:69:61: error: invalid static_cast from type ‘const torch::OrderedDict<std::basic_string, std::shared_ptrtorch::nn::Module >’ to type ‘torch::O
rderedDict<std::basic_string, std::shared_ptrtorch::nn::Module >&’
/nvme/yanxiangchao/anaconda3/envs/test/lib/python3.9/site-packages/torch/include/torch/csrc/api/include/torch/nn/cloneable.h: In instantiation of ‘std::shared_ptrtorch::nn::Module torch::nn::Cloneable::clone(const c10::optionalc10::Device&) const [with Derived =
torch::nn::TransformerDecoderImpl]’:
/nvme/yanxiangchao/anaconda3/envs/test/lib/python3.9/site-packages/torch/include/torch/csrc/api/include/torch/optim/sgd.h:49:48: required from here
/nvme/yanxiangchao/anaconda3/envs/test/lib/python3.9/site-packages/torch/include/torch/csrc/api/include/torch/nn/cloneable.h:57:59: error: invalid static_cast from type ‘const torch::OrderedDict<std::basic_string, at::Tensor>’ to type ‘torch::OrderedDict<std::basic_str
ing, at::Tensor>&’
/nvme/yanxiangchao/anaconda3/envs/test/lib/python3.9/site-packages/torch/include/torch/csrc/api/include/torch/nn/cloneable.h:69:61: error: invalid static_cast from type ‘const torch::OrderedDict<std::basic_string, std::shared_ptrtorch::nn::Module >’ to type ‘torch::O
rderedDict<std::basic_string, std::shared_ptrtorch::nn::Module >&’
/nvme/yanxiangchao/anaconda3/envs/test/lib/python3.9/site-packages/torch/include/torch/csrc/api/include/torch/nn/cloneable.h: In instantiation of ‘std::shared_ptrtorch::nn::Module torch::nn::Cloneable::clone(const c10::optionalc10::Device&) const [with Derived =
torch::nn::TransformerEncoderImpl]’:
/nvme/yanxiangchao/anaconda3/envs/test/lib/python3.9/site-packages/torch/include/torch/csrc/api/include/torch/optim/sgd.h:49:48: required from here
/nvme/yanxiangchao/anaconda3/envs/test/lib/python3.9/site-packages/torch/include/torch/csrc/api/include/torch/optim/sgd.h:49:48: required from here
/nvme/yanxiangchao/anaconda3/envs/test/lib/python3.9/site-packages/torch/include/torch/csrc/api/include/torch/nn/cloneable.h:57:59: error: invalid static_cast from type ‘const torch::OrderedDict<std::basic_string, at::Tensor>’ to type ‘torch::OrderedDict<std::basic_str
ing, at::Tensor>&’
/nvme/yanxiangchao/anaconda3/envs/test/lib/python3.9/site-packages/torch/include/torch/csrc/api/include/torch/nn/cloneable.h:69:61: error: invalid static_cast from type ‘const torch::OrderedDict<std::basic_string, std::shared_ptrtorch::nn::Module >’ to type ‘torch::O
rderedDict<std::basic_string, std::shared_ptrtorch::nn::Module >&’
ninja: build stopped: subcommand failed.
Traceback (most recent call last):
File "/nvme/yanxiangchao/anaconda3/envs/test/lib/python3.9/site-packages/torch/utils/cpp_extension.py", line 1717, in _run_ninja_build
subprocess.run(
File "/nvme/yanxiangchao/anaconda3/envs/test/lib/python3.9/subprocess.py", line 528, in run
raise CalledProcessError(retcode, process.args,
subprocess.CalledProcessError: Command '['ninja', '-v']' returned non-zero exit status 1.

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
File "/nvme/yanxiangchao/perl5/pretrain/openmdf_dsvt/openmdf/setup.py", line 34, in
setup(
File "/nvme/yanxiangchao/anaconda3/envs/test/lib/python3.9/site-packages/setuptools/init.py", line 87, in setup
return distutils.core.setup(**attrs)
File "/nvme/yanxiangchao/anaconda3/envs/test/lib/python3.9/site-packages/setuptools/_distutils/core.py", line 185, in setup
return run_commands(dist)
File "/nvme/yanxiangchao/anaconda3/envs/test/lib/python3.9/site-packages/setuptools/_distutils/core.py", line 201, in run_commands
dist.run_commands()
File "/nvme/yanxiangchao/anaconda3/envs/test/lib/python3.9/site-packages/setuptools/_distutils/dist.py", line 968, in run_commands
self.run_command(cmd)
File "/nvme/yanxiangchao/anaconda3/envs/test/lib/python3.9/site-packages/setuptools/dist.py", line 1217, in run_command
super().run_command(command)
File "/nvme/yanxiangchao/anaconda3/envs/test/lib/python3.9/site-packages/setuptools/_distutils/dist.py", line 987, in run_command
cmd_obj.run()
File "/nvme/yanxiangchao/anaconda3/envs/test/lib/python3.9/site-packages/setuptools/command/develop.py", line 34, in run
self.install_for_development()
File "/nvme/yanxiangchao/anaconda3/envs/test/lib/python3.9/site-packages/setuptools/command/develop.py", line 114, in install_for_development
self.run_command('build_ext')
File "/nvme/yanxiangchao/anaconda3/envs/test/lib/python3.9/site-packages/setuptools/_distutils/cmd.py", line 319, in run_command
self.distribution.run_command(command)
File "/nvme/yanxiangchao/anaconda3/envs/test/lib/python3.9/site-packages/setuptools/dist.py", line 1217, in run_command
super().run_command(command)
File "/nvme/yanxiangchao/anaconda3/envs/test/lib/python3.9/site-packages/setuptools/_distutils/dist.py", line 987, in run_command
cmd_obj.run()
File "/nvme/yanxiangchao/anaconda3/envs/test/lib/python3.9/site-packages/setuptools/command/build_ext.py", line 84, in run
_build_ext.run(self)
File "/nvme/yanxiangchao/anaconda3/envs/test/lib/python3.9/site-packages/setuptools/_distutils/command/build_ext.py", line 346, in run
self.build_extensions()
File "/nvme/yanxiangchao/anaconda3/envs/test/lib/python3.9/site-packages/torch/utils/cpp_extension.py", line 735, in build_extensions
build_ext.build_extensions(self)
File "/nvme/yanxiangchao/anaconda3/envs/test/lib/python3.9/site-packages/setuptools/_distutils/command/build_ext.py", line 466, in build_extensions
self._build_extensions_serial()
File "/nvme/yanxiangchao/anaconda3/envs/test/lib/python3.9/site-packages/setuptools/_distutils/command/build_ext.py", line 492, in _build_extensions_serial
self.build_extension(ext)
File "/nvme/yanxiangchao/anaconda3/envs/test/lib/python3.9/site-packages/setuptools/command/build_ext.py", line 246, in build_extension
_build_ext.build_extension(self, ext)
File "/nvme/yanxiangchao/anaconda3/envs/test/lib/python3.9/site-packages/setuptools/_distutils/command/build_ext.py", line 547, in build_extension
objects = self.compiler.compile(
File "/nvme/yanxiangchao/anaconda3/envs/test/lib/python3.9/site-packages/torch/utils/cpp_extension.py", line 556, in unix_wrap_ninja_compile
_write_ninja_file_and_compile_objects(
File "/nvme/yanxiangchao/anaconda3/envs/test/lib/python3.9/site-packages/torch/utils/cpp_extension.py", line 1399, in _write_ninja_file_and_compile_objects
_run_ninja_build(
File "/nvme/yanxiangchao/anaconda3/envs/test/lib/python3.9/site-packages/torch/utils/cpp_extension.py", line 1733, in _run_ninja_build
raise RuntimeError(message) from e
RuntimeError: Error compiling objects for extension
Haiyang-W commented 1 year ago

Please check your CUDA Version, Graphics card type, and corresponding torch version. We can successfully compile on V100, 3090, and A100 with CUDA >= 11.0.

Doctor-James commented 1 year ago

I have the same question,my version is torch==1.10.0+cu111,then came the same mistake. I also try the torch==1.9.1+cu111,then I was able to run python setup.py develop successfully.My gpu is 3090, cuda version 11.1

sky-fly97 commented 1 year ago

I have the same question,my version is torch==1.10.0+cu111,then came the same mistake. I also try the torch==1.9.1+cu111,then I was able to run python setup.py develop successfully.My gpu is 3090, cuda version 11.1

Yes, I also try the torch==1.9.1+cu111 and it works, but can you works when training?

File "/nvme/anaconda3/envs/test/lib/python3.9/site-packages/torch/nn/modules/conv.py", line 439, in _conv_forward
    return F.conv2d(input, weight, bias, self.stride,
RuntimeError: cuDNN error: CUDNN_STATUS_NOT_INITIALIZED
Haiyang-W commented 1 year ago

One of the environments we succeed in is torch 1.9.1+cu111, CUDA >11.1, V100.

Haiyang-W commented 1 year ago

I have the same question,my version is torch==1.10.0+cu111,then came the same mistake. I also try the torch==1.9.1+cu111,then I was able to run python setup.py develop successfully.My gpu is 3090, cuda version 11.1

Yes, I also try the torch==1.9.1+cu111 and it works, but can you works when training?

File "/nvme/anaconda3/envs/test/lib/python3.9/site-packages/torch/nn/modules/conv.py", line 439, in _conv_forward
    return F.conv2d(input, weight, bias, self.stride,
RuntimeError: cuDNN error: CUDNN_STATUS_NOT_INITIALIZED

This problem does not seem to be our problem. It may be that you have problems installing torch 1.9.1 in your environment.

chenshi3 commented 1 year ago

We build a new environment with torch==1.10.0+cu111 and torchvision==0.11.0+cu111 and compile successfully.

Haiyang-W commented 1 year ago

I have the same question,my version is torch==1.10.0+cu111,then came the same mistake. I also try the torch==1.9.1+cu111,then I was able to run python setup.py develop successfully.My gpu is 3090, cuda version 11.1

Yes, I also try the torch==1.9.1+cu111 and it works, but can you works when training?

File "/nvme/anaconda3/envs/test/lib/python3.9/site-packages/torch/nn/modules/conv.py", line 439, in _conv_forward
    return F.conv2d(input, weight, bias, self.stride,
RuntimeError: cuDNN error: CUDNN_STATUS_NOT_INITIALIZED

Has the problem been solved?

sky-fly97 commented 1 year ago

I have the same question,my version is torch==1.10.0+cu111,then came the same mistake. I also try the torch==1.9.1+cu111,then I was able to run python setup.py develop successfully.My gpu is 3090, cuda version 11.1

Yes, I also try the torch==1.9.1+cu111 and it works, but can you works when training?

File "/nvme/anaconda3/envs/test/lib/python3.9/site-packages/torch/nn/modules/conv.py", line 439, in _conv_forward
    return F.conv2d(input, weight, bias, self.stride,
RuntimeError: cuDNN error: CUDNN_STATUS_NOT_INITIALIZED

Has the problem been solved?

Yes, I can train it successfully on torch==1.9.1+cu111!

Haiyang-W commented 1 year ago

I have the same question,my version is torch==1.10.0+cu111,then came the same mistake. I also try the torch==1.9.1+cu111,then I was able to run python setup.py develop successfully.My gpu is 3090, cuda version 11.1

Yes, I also try the torch==1.9.1+cu111 and it works, but can you works when training?

File "/nvme/anaconda3/envs/test/lib/python3.9/site-packages/torch/nn/modules/conv.py", line 439, in _conv_forward
    return F.conv2d(input, weight, bias, self.stride,
RuntimeError: cuDNN error: CUDNN_STATUS_NOT_INITIALIZED

Has the problem been solved?

Yes, I can train it successfully on torch==1.9.1+cu111!

Great! Hope everything goes well!

Haiyang-W commented 1 year ago

The bug has been fixed in here. It's caused by ninja and torch extension.