Linwei-Chen / FreqFusion

TPAMI:Frequency-aware Feature Fusion for Dense Image Prediction
198 stars 9 forks source link

改动到FPN结构上报错ext_module.carafe_backward( RuntimeError: CUDA error: invalid configuration argument #17

Open omnipotenttom opened 2 hours ago

omnipotenttom commented 2 hours ago

作者您好,非常感谢你们的贡献! traceback (most recent call last): File "/root/MVSFormer/train_m.py", line 238, in main(0,args, config) File "/root/MVSFormer/train_m.py", line 190, in main trainer.train() File "/root/MVSFormer/base/base_trainer.py", line 78, in train result = self._train_epoch(epoch) File "/root/MVSFormer/trainer/mvsformer_trainer.py", line 141, in _train_epoch loss.backward() File "/root/miniconda3/envs/mvsformer/lib/python3.10/site-packages/torch/_tensor.py", line 521, in backward torch.autograd.backward( File "/root/miniconda3/envs/mvsformer/lib/python3.10/site-packages/torch/autograd/init.py", line 289, in backward _engine_run_backward( File "/root/miniconda3/envs/mvsformer/lib/python3.10/site-packages/torch/autograd/graph.py", line 768, in _engine_run_backward return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass File "/root/miniconda3/envs/mvsformer/lib/python3.10/site-packages/torch/autograd/function.py", line 306, in apply return user_fn(self, *args) File "/root/mmcv/mmcv/ops/carafe.py", line 180, in backward ext_module.carafe_backward( RuntimeError: CUDA error: invalid configuration argument CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect. For debugging consider passing CUDA_LAUNCH_BLOCKING=1 Compile with TORCH_USE_CUDA_DSA to enable device-side assertions.

image 输入为四个尺度的特征,能运行几下,几下过后就报上述错误。参照mmcv官方文档类似问题: RuntimeError: CUDA error: invalid configuration argument" This error may be caused by the poor performance of GPU. Try to decrease the value of [THREADS_PER_BLOCK] 修改[THREADS_PER_BLOCK]后重编译mmcv,仍然报错,使用GPU为L20,py3.10,torch2.4.0(原项目版本要求)+cu12.1,mmcv2.2.0请问作者您有没有什么建议?

omnipotenttom commented 2 hours ago

代码一些其他模块要求torch2.4,所以可能不太方便降版本(;´༎ຶД༎ຶ`)