eval.py 报错：RuntimeError: Attempted to set the storage of a tensor on device "cuda:0" to a storage on different device "cpu". This is no longer allowed; the devices must match.

stealth0414 commented 1 year ago

用自己的数据集训练完成后尝试运行eval.py，发现报错 Traceback (most recent call last): File "eval.py", line 193, in main() File "eval.py", line 79, in main Eval(experiment, experiment_args, cmd=args, verbose=args['verbose']).eval(args['visualize']) File "eval.py", line 164, in eval model = self.init_model() File "eval.py", line 107, in init_model model = self.structure.builder.build(self.device) File "/hy-tmp/DB-yanhua/structure/builder.py", line 24, in build model = Model(self.model_args, device, File "/hy-tmp/DB-yanhua/structure/model.py", line 37, in init self.model = BasicModel(args) File "/hy-tmp/DB-yanhua/structure/model.py", line 15, in init self.backbone = getattr(backbones, args['backbone'])(args.get('backbone_args', {})) File "/hy-tmp/DB-yanhua/backbones/resnet.py", line 310, in deformable_resnet50 model.load_state_dict(model_zoo.load_url( File "/usr/local/lib/python3.8/dist-packages/torch/hub.py", line 731, in load_state_dict_from_url return torch.load(cached_file, map_location=map_location) File "/usr/local/lib/python3.8/dist-packages/torch/serialization.py", line 713, in load return _legacy_load(opened_file, map_location, pickle_module, pickle_load_args) File "/usr/local/lib/python3.8/dist-packages/torch/serialization.py", line 905, in _legacy_load return legacy_load(f) File "/usr/local/lib/python3.8/dist-packages/torch/serialization.py", line 841, in legacyload tensor = torch.tensor([], dtype=storage.dtype).set( RuntimeError: Attempted to set the storage of a tensor on device "cuda:0" to a storage on different device "cpu". This is no longer allowed; the devices must match.

def deformable_resnet50(pretrained=True, kwargs): """Constructs a ResNet-50 model with deformable conv. Args: pretrained (bool): If True, returns a model pre-trained on ImageNet """ model = ResNet(Bottleneck, [3, 4, 6, 3], dcn=dict(modulated=True, deformable_groups=1, fallback_on_stride=False), stage_with_dcn=[False, True, True, True], kwargs) if pretrained: model.load_state_dict(model_zoo.load_url( model_urls['resnet50']), strict=False) return model

Why do I still need to load the resnet50 pre-training weight after training? Do you have friends who have solved it？ Also try to comment out if pretrained, the metrics are all 0, [INFO] [2023-03-17 18:04:29,041] precision : 0.000000 (44) [INFO] [2023-03-17 18:04:29,042] recall : 0.000000 (44) [INFO] [2023-03-17 18:04:29,042] fmeasure : 0.000000 (1) thanks

YunlongGa commented 1 year ago

我遇到了同样的问题，请问您解决了吗

stealth0414 commented 1 year ago

忘记了，但你可以试试在作者的预训练模型上进行训练

SairaiL commented 1 year ago

可能是pytorch的问题，我在linux上11.3版本也出现这个问题，但是自己的电脑10.2就没问题

YunlongGa commented 1 year ago

好的，谢谢您了

MxxM-max commented 3 months ago

请问您解决了吗，我也遇到一样的问题

Realtyxxx commented 1 month ago

我在这里找到了解决方法： https://stackoverflow.com/questions/71643035/runtimeerror-attempted-to-set-the-storage-of-a-tensor-on-device-cuda0-to-a-s 大家可以参考下

Obezyan0941 commented 1 month ago

I have experienced this issue and this is how I resolved it: The issue traces back to a script resnet.py to a line 46. During training and validation I have changed the line to: pretrained_dict = model_zoo.load_url(url) But it does not work for eval. During eval I change the line to: pretrained_dict = model_zoo.load_url(url, map_location=device) Have no idea how to solve it completley but it works fine for now. Hope it helps!

MhLiao / DB

eval.py 报错：RuntimeError: Attempted to set the storage of a tensor on device "cuda:0" to a storage on different device "cpu". This is no longer allowed; the devices must match. #363