Open ccqlx opened 1 year ago
Hi, what is the version of your CUDA? You can use nvcc -V
to print the information of CUDA.
@czczup CUDA is 11.1
(ViT-Adapter-main) ccq@ccq:~/Data/ViT-Adapter-main$ nvcc -V nvcc: NVIDIA (R) Cuda compiler driver Copyright (c) 2005-2020 NVIDIA Corporation Built on Tue_Sep_15_19:10:02_PDT_2020 Cuda compilation tools, release 11.1, V11.1.74 Build cuda_11.1.TC455_06.29069683_0
@ccqlx You could run the test.py to check if deformable attention is installed successfully.
Run it like this:
cd detection/ops/
python test.py
@czczup I gave it a try and showed the following:
(ViT-Adapter-main) ccq@ccq:~/Data/ViT-Adapter-main/detection/ops$ python test.py
Traceback (most recent call last):
File "test.py", line 12, in
I think the deformable attention is not compiled successfully. You can try this: replace line 11
import MultiScaleDeformableAttention as MSDA
in the ms_deform_attn_func.py with
from mmcv.ops.multi_scale_deform_attn import ext_module as MSDA
and then run python test.py
again.
It showed the following:
(ViT-Adapter-main) ccq@ccq:~/Data/ViT-Adapter-main/detection/ops$ python test.py
Invoked with: tensor([[[[2.8306e-03, 9.4753e-03, 9.3084e-03, ..., 6.8304e-03, 8.3203e-03, 1.7387e-03], [8.2308e-03, 3.5872e-03, 3.3102e-03, ..., 5.9733e-03, 9.4355e-03, 4.3745e-03]],
[[3.6581e-03, 3.2056e-03, 1.2741e-03, ..., 8.7882e-03,
5.3017e-03, 6.8271e-04],
[4.7188e-03, 4.3066e-03, 1.9086e-03, ..., 4.8850e-03,
8.5183e-03, 2.6952e-03]],
[[4.9191e-03, 4.2033e-03, 2.2696e-03, ..., 8.6269e-03,
7.5990e-03, 5.2359e-03],
[8.2033e-03, 4.8610e-03, 9.2209e-04, ..., 1.3456e-03,
7.8573e-04, 6.9594e-05]],
...,
[[1.3591e-04, 9.7463e-03, 2.6549e-03, ..., 5.1311e-03,
8.3771e-03, 6.5187e-03],
[9.4567e-03, 2.1441e-03, 3.2841e-03, ..., 6.4269e-03,
1.2673e-03, 2.4591e-03]],
[[5.5316e-03, 8.9207e-03, 3.7639e-03, ..., 9.2534e-03,
5.9210e-04, 5.7246e-03],
[5.2976e-03, 4.3412e-03, 2.0720e-03, ..., 6.8974e-06,
9.2739e-03, 1.0133e-03]],
[[8.3681e-03, 3.6025e-03, 3.9561e-03, ..., 4.1047e-03,
6.9461e-03, 5.9292e-04],
[8.6039e-04, 4.2888e-03, 5.7870e-03, ..., 8.9754e-03,
8.3673e-03, 5.7373e-03]]]], device='cuda:0', dtype=torch.float64,
grad_fn=<CopyBackwards>), tensor([[6, 4],
[3, 2]], device='cuda:0'), tensor([ 0, 24], device='cuda:0'), tensor([[[[[[0.9039, 0.4670],
[0.9605, 0.1661]],
[[0.3754, 0.2202],
[0.8897, 0.0443]]],
[[[0.9926, 0.3067],
[0.1081, 0.2196]],
[[0.2653, 0.2301],
[0.0962, 0.9684]]]],
[[[[0.4655, 0.3431],
[0.4971, 0.2944]],
[[0.9427, 0.5881],
[0.7237, 0.7388]]],
[[[0.5605, 0.9126],
[0.4982, 0.3065]],
[[0.2988, 0.6454],
[0.5300, 0.7064]]]]]], device='cuda:0', dtype=torch.float64,
grad_fn=<CopyBackwards>), tensor([[[[[0.2759, 0.2690],
[0.1009, 0.3542]],
[[0.3140, 0.2090],
[0.2103, 0.2667]]],
[[[0.0828, 0.2179],
[0.3503, 0.3489]],
[[0.3138, 0.2989],
[0.0730, 0.3143]]]]], device='cuda:0', dtype=torch.float64,
grad_fn=<CopyBackwards>), tensor([[[1., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
0., 0., 0., 0., 0., 0., 0., 0., 0.],
[0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
0., 0., 0., 0., 0., 0., 0., 0., 0.]]], device='cuda:0',
dtype=torch.float64), 2
I think now you can try to run inference with a checkpoint
hi, I had the same problem. Have you solved this problem?
I think now you can try to run inference with a checkpoint
@czczup 你好,我按照这个修改了代码,然后运行了python test.py
,遇到了同样的问题
TypeError: ms_deform_attn_backward(): incompatible function arguments. The following argument types are supported:
- (value: at::Tensor, value_spatial_shapes: at::Tensor, value_level_start_index: at::Tensor, sampling_locations: at::Tensor, attention_weights: at::Tensor, grad_output: at::Tensor, grad_value: at::Tensor, grad_sampling_loc: at::Tensor, grad_attn_weight: at::Tensor, im2col_step: int) -> None
这种情况我是可以加载预训练权重去推理的,但是训练时需要反向传播时就会报错,请问要如何解决呢?
I think now you can try to run inference with a checkpoint
@czczup 你好,我按照这个修改了代码,然后运行了
python test.py
,遇到了同样的问题TypeError: ms_deform_attn_backward(): incompatible function arguments. The following argument types are supported:
- (value: at::Tensor, value_spatial_shapes: at::Tensor, value_level_start_index: at::Tensor, sampling_locations: at::Tensor, attention_weights: at::Tensor, grad_output: at::Tensor, grad_value: at::Tensor, grad_sampling_loc: at::Tensor, grad_attn_weight: at::Tensor, im2col_step: int) -> None
这种情况我是可以加载预训练权重去推理的,但是训练时需要反向传播时就会报错,请问要如何解决呢?
请问你解决了吗?我也是同样的问题
I think now you can try to run inference with a checkpoint
@czczup 你好,我按照这个修改了代码,然后运行了
python test.py
,遇到了同样的问题TypeError: ms_deform_attn_backward(): incompatible function arguments. The following argument types are supported:
- (value: at::Tensor, value_spatial_shapes: at::Tensor, value_level_start_index: at::Tensor, sampling_locations: at::Tensor, attention_weights: at::Tensor, grad_output: at::Tensor, grad_value: at::Tensor, grad_sampling_loc: at::Tensor, grad_attn_weight: at::Tensor, im2col_step: int) -> None
这种情况我是可以加载预训练权重去推理的,但是训练时需要反向传播时就会报错,请问要如何解决呢?
请问你解决了吗?我也是同样的问题 解决了,按readme重新装了容器和虚拟环境,cuda版本要完全一致才可以安装成功
I think now you can try to run inference with a checkpoint
@czczup 你好,我按照这个修改了代码,然后运行了
python test.py
,遇到了同样的问题TypeError: ms_deform_attn_backward(): incompatible function arguments. The following argument types are supported:
- (value: at::Tensor, value_spatial_shapes: at::Tensor, value_level_start_index: at::Tensor, sampling_locations: at::Tensor, attention_weights: at::Tensor, grad_output: at::Tensor, grad_value: at::Tensor, grad_sampling_loc: at::Tensor, grad_attn_weight: at::Tensor, im2col_step: int) -> None
这种情况我是可以加载预训练权重去推理的,但是训练时需要反向传播时就会报错,请问要如何解决呢?
请问你解决了吗?我也是同样的问题 解决了,按readme重新装了容器和虚拟环境,cuda版本要完全一致才可以安装成功
您好,请问您重装环境后是成功在Windows环境下编译了deformation attention模块吗?还是使用了mmcv中预编译的deformable attention?
I think now you can try to run inference with a checkpoint
@czczup 你好,我按照这个修改了代码,然后运行了
python test.py
,遇到了同样的问题TypeError: ms_deform_attn_backward(): incompatible function arguments. The following argument types are supported:
- (value: at::Tensor, value_spatial_shapes: at::Tensor, value_level_start_index: at::Tensor, sampling_locations: at::Tensor, attention_weights: at::Tensor, grad_output: at::Tensor, grad_value: at::Tensor, grad_sampling_loc: at::Tensor, grad_attn_weight: at::Tensor, im2col_step: int) -> None
这种情况我是可以加载预训练权重去推理的,但是训练时需要反向传播时就会报错,请问要如何解决呢?
请问你解决了吗?我也是同样的问题 解决了,按readme重新装了容器和虚拟环境,cuda版本要完全一致才可以安装成功
您好,请问您重装环境后是成功在Windows环境下编译了deformation attention模块吗?还是使用了mmcv中预编译的deformable attention?
我是在linux环境编译成功的,windows没有试过
It showed the following:
(ViT-Adapter-main) ccq@ccq:~/Data/ViT-Adapter-main/detection/ops$ python test.py
- True check_forward_equal_with_pytorch_double: max_abs_err 8.67e-19 max_rel_err 2.35e-16
True check_forward_equal_with_pytorch_float: max_abs_err 4.66e-10 max_rel_err 1.13e-07 Traceback (most recent call last): File "test.py", line 109, in check_gradient_numerical(channels, True, True, True) File "test.py", line 96, in check_gradient_numerical gradok = gradcheck( File "/home/ccq/anaconda3/envs/ViT-Adapter-main/lib/python3.8/site-packages/torch/autograd/gradcheck.py", line 1245, in gradcheck return _gradcheck_helper(args) File "/home/ccq/anaconda3/envs/ViT-Adapter-main/lib/python3.8/site-packages/torch/autograd/gradcheck.py", line 1258, in _gradcheck_helper _gradcheck_real_imag(gradcheck_fn, func, func_out, tupled_inputs, outputs, eps, File "/home/ccq/anaconda3/envs/ViT-Adapter-main/lib/python3.8/site-packages/torch/autograd/gradcheck.py", line 930, in _gradcheck_real_imag gradcheck_fn(func, func_out, tupled_inputs, outputs, eps, File "/home/ccq/anaconda3/envs/ViT-Adapter-main/lib/python3.8/site-packages/torch/autograd/gradcheck.py", line 974, in _slow_gradcheck analytical = _check_analytical_jacobian_attributes(tupled_inputs, o, nondet_tol, check_grad_dtypes) File "/home/ccq/anaconda3/envs/ViT-Adapter-main/lib/python3.8/site-packages/torch/autograd/gradcheck.py", line 516, in _check_analytical_jacobian_attributes vjps1 = _compute_analytical_jacobian_rows(vjp_fn, output.clone()) File "/home/ccq/anaconda3/envs/ViT-Adapter-main/lib/python3.8/site-packages/torch/autograd/gradcheck.py", line 608, in _compute_analytical_jacobian_rows grad_inputs = vjp_fn(grad_out_base) File "/home/ccq/anaconda3/envs/ViT-Adapter-main/lib/python3.8/site-packages/torch/autograd/gradcheck.py", line 509, in vjp_fn return torch.autograd.grad(output, diff_input_list, grad_output, File "/home/ccq/anaconda3/envs/ViT-Adapter-main/lib/python3.8/site-packages/torch/autograd/init*.py", line 226, in grad return Variable._execution_engine.run_backward( File "/home/ccq/anaconda3/envs/ViT-Adapter-main/lib/python3.8/site-packages/torch/autograd/function.py", line 87, in apply return self._forward_cls.backward(self, args) # type: ignore[attr-defined] File "/home/ccq/anaconda3/envs/ViT-Adapter-main/lib/python3.8/site-packages/torch/autograd/function.py", line 204, in wrapper outputs = fn(ctx, args) File "/home/ccq/anaconda3/envs/ViT-Adapter-main/lib/python3.8/site-packages/torch/cuda/amp/autocast_mode.py", line 236, in decorate_bwd return bwd(args, **kwargs) File "/home/ccq/Data/ViT-Adapter-main/detection/ops/functions/ms_deform_attn_func.py", line 43, in backward MSDA.ms_deform_attn_backward( TypeError: ms_deform_attn_backward(): incompatible function arguments. The following argument types are supported:
- (value: at::Tensor, value_spatial_shapes: at::Tensor, value_level_start_index: at::Tensor, sampling_locations: at::Tensor, attention_weights: at::Tensor, grad_output: at::Tensor, grad_value: at::Tensor, grad_sampling_loc: at::Tensor, grad_attn_weight: at::Tensor, im2col_step: int) -> None
Invoked with: tensor([[[[2.8306e-03, 9.4753e-03, 9.3084e-03, ..., 6.8304e-03, 8.3203e-03, 1.7387e-03], [8.2308e-03, 3.5872e-03, 3.3102e-03, ..., 5.9733e-03, 9.4355e-03, 4.3745e-03]],
[[3.6581e-03, 3.2056e-03, 1.2741e-03, ..., 8.7882e-03, 5.3017e-03, 6.8271e-04], [4.7188e-03, 4.3066e-03, 1.9086e-03, ..., 4.8850e-03, 8.5183e-03, 2.6952e-03]], [[4.9191e-03, 4.2033e-03, 2.2696e-03, ..., 8.6269e-03, 7.5990e-03, 5.2359e-03], [8.2033e-03, 4.8610e-03, 9.2209e-04, ..., 1.3456e-03, 7.8573e-04, 6.9594e-05]], ..., [[1.3591e-04, 9.7463e-03, 2.6549e-03, ..., 5.1311e-03, 8.3771e-03, 6.5187e-03], [9.4567e-03, 2.1441e-03, 3.2841e-03, ..., 6.4269e-03, 1.2673e-03, 2.4591e-03]], [[5.5316e-03, 8.9207e-03, 3.7639e-03, ..., 9.2534e-03, 5.9210e-04, 5.7246e-03], [5.2976e-03, 4.3412e-03, 2.0720e-03, ..., 6.8974e-06, 9.2739e-03, 1.0133e-03]], [[8.3681e-03, 3.6025e-03, 3.9561e-03, ..., 4.1047e-03, 6.9461e-03, 5.9292e-04], [8.6039e-04, 4.2888e-03, 5.7870e-03, ..., 8.9754e-03, 8.3673e-03, 5.7373e-03]]]], device='cuda:0', dtype=torch.float64, grad_fn=<CopyBackwards>), tensor([[6, 4], [3, 2]], device='cuda:0'), tensor([ 0, 24], device='cuda:0'), tensor([[[[[[0.9039, 0.4670], [0.9605, 0.1661]], [[0.3754, 0.2202], [0.8897, 0.0443]]], [[[0.9926, 0.3067], [0.1081, 0.2196]], [[0.2653, 0.2301], [0.0962, 0.9684]]]], [[[[0.4655, 0.3431], [0.4971, 0.2944]], [[0.9427, 0.5881], [0.7237, 0.7388]]], [[[0.5605, 0.9126], [0.4982, 0.3065]], [[0.2988, 0.6454], [0.5300, 0.7064]]]]]], device='cuda:0', dtype=torch.float64, grad_fn=<CopyBackwards>), tensor([[[[[0.2759, 0.2690], [0.1009, 0.3542]], [[0.3140, 0.2090], [0.2103, 0.2667]]], [[[0.0828, 0.2179], [0.3503, 0.3489]], [[0.3138, 0.2989], [0.0730, 0.3143]]]]], device='cuda:0', dtype=torch.float64, grad_fn=<CopyBackwards>), tensor([[[1., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.], [0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.]]], device='cuda:0', dtype=torch.float64), 2
你好,我在复现vit Adapter模型的时候遇到了跟你同样的问题,困扰了我多日,请问,您解决了吗?可否告诉我呢?
I think now you can try to run inference with a checkpoint
@czczup 你好,我按照这个修改了代码,然后运行了
python test.py
,遇到了同样的问题TypeError: ms_deform_attn_backward(): incompatible function arguments. The following argument types are supported:
- (value: at::Tensor, value_spatial_shapes: at::Tensor, value_level_start_index: at::Tensor, sampling_locations: at::Tensor, attention_weights: at::Tensor, grad_output: at::Tensor, grad_value: at::Tensor, grad_sampling_loc: at::Tensor, grad_attn_weight: at::Tensor, im2col_step: int) -> None
这种情况我是可以加载预训练权重去推理的,但是训练时需要反向传播时就会报错,请问要如何解决呢?
请问你解决了吗?我也是同样的问题 解决了,按readme重新装了容器和虚拟环境,cuda版本要完全一致才可以安装成功
您好,请问您重装环境后是成功在Windows环境下编译了deformation attention模块吗?还是使用了mmcv中预编译的deformable attention?
我是在linux环境编译成功的,windows没有试过
你好,有在windows环境下编译deformation attention模块吗?是不是会编译失败?我遇到了同样的问题
Traceback (most recent call last): File "/home/ccq/Data/liu/ViT-Adapter-main/segmentation/train.py", line 11, in
import mmseg_custom # noqa: F401,F403
File "/home/ccq/Data/liu/ViT-Adapter-main/segmentation/mmseg_custom/init.py", line 3, in
from .models import # noqa: F401,F403
File "/home/ccq/Data/liu/ViT-Adapter-main/segmentation/mmseg_custom/models/init.py", line 2, in
from .backbones import # noqa: F401,F403
File "/home/ccq/Data/liu/ViT-Adapter-main/segmentation/mmseg_custom/models/backbones/init.py", line 2, in
from .beit_adapter import BEiTAdapter
File "/home/ccq/Data/liu/ViT-Adapter-main/segmentation/mmseg_custom/models/backbones/beit_adapter.py", line 9, in
from detection.ops.modules import MSDeformAttn
File "/home/ccq/Data/liu/ViT-Adapter-main/detection/ops/modules/init.py", line 9, in
from .ms_deform_attn import MSDeformAttn
File "/home/ccq/Data/liu/ViT-Adapter-main/detection/ops/modules/ms_deform_attn.py", line 19, in
from ..functions import MSDeformAttnFunction
File "/home/ccq/Data/liu/ViT-Adapter-main/detection/ops/functions/init.py", line 9, in
from .ms_deform_attn_func import MSDeformAttnFunction
File "/home/ccq/Data/liu/ViT-Adapter-main/detection/ops/functions/ms_deform_attn_func.py", line 11, in
import MultiScaleDeformableAttention as MSDA
ImportError: /home/ccq/anaconda3/envs/ViT-Adapter-main/lib/python3.8/site-packages/MultiScaleDeformableAttention-1.0-py3.8-linux-x86_64.egg/MultiScaleDeformableAttention.cpython-38-x86_64-linux-gnu.so: undefined symbol: _ZNK2at10TensorBase8data_ptrIdEEPT_v
Hello, what is the reason for this error? I want a solution, thanks.