ayumiymk / aster.pytorch

ASTER in Pytorch
MIT License
663 stars 169 forks source link

您好,我这里使用ic03_867的数据集进行验证,目前出现了错误 #57

Closed Heermosi closed 3 years ago

Heermosi commented 3 years ago

您好,在进入某一层前传的过程中出现了形状不匹配的问题

Traceback (most recent call last): File "main.py", line 229, in main(args) File "main.py", line 182, in main evaluator.evaluate(test_loader, dataset=test_dataset, vis_dir=vis_dir) File "/aster/lib/evaluators.py", line 47, in evaluate output_dict = self._forward(input_dict) File "/aster/lib/evaluators.py", line 152, in _forward output_dict = self.model(input_dict) File "/opt/conda/envs/recognition/lib/python3.7/site-packages/torch/nn/modules/module.py", line 493, in call result = self.forward(*input, kwargs) File "/opt/conda/envs/recognition/lib/python3.7/site-packages/torch/nn/parallel/data_parallel.py", line 150, in forward return self.module(*inputs[0], *kwargs[0]) File "/opt/conda/envs/recognition/lib/python3.7/site-packages/torch/nn/modules/module.py", line 493, in call result = self.forward(input, kwargs) File "/aster/lib/models/model_builder.py", line 76, in forward stn_img_feat, ctrl_points = self.stn_head(stn_input) File "/opt/conda/envs/recognition/lib/python3.7/site-packages/torch/nn/modules/module.py", line 493, in call result = self.forward(*input, kwargs) File "/aster/lib/models/stn_head.py", line 89, in forward img_feat = self.stn_fc1(x) File "/opt/conda/envs/recognition/lib/python3.7/site-packages/torch/nn/modules/module.py", line 493, in call result = self.forward(*input, *kwargs) File "/opt/conda/envs/recognition/lib/python3.7/site-packages/torch/nn/modules/container.py", line 92, in forward input = module(input) File "/opt/conda/envs/recognition/lib/python3.7/site-packages/torch/nn/modules/module.py", line 493, in call result = self.forward(input, kwargs) File "/opt/conda/envs/recognition/lib/python3.7/site-packages/torch/nn/modules/linear.py", line 92, in forward return F.linear(input, self.weight, self.bias) File "/opt/conda/envs/recognition/lib/python3.7/site-packages/torch/nn/functional.py", line 1406, in linear ret = torch.addmm(bias, input, weight.t()) RuntimeError: cublas runtime error : the GPU program failed to execute at /opt/conda/conda-bld/pytorch_1556653215914/work/aten/src/THC/THCBlas.cu:259

修改了代码后,得到了bias, input, weight的形状,分别为 torch.Size([512]) torch.Size([867, 512]) torch.Size([512, 512]) 应为 Input X Weight转置 + bias 形状基本上都是对上的

比较糟糕的是这个cu文件貌似找不到,可能是临时的或者是编译pytorch过程中的文件 目前我使用的是9700K + titan rtx 是不是必须要换pytorch的版本?

Heermosi commented 3 years ago

您好,在进入某一层前传的过程中出现了形状不匹配的问题

Traceback (most recent call last): File "main.py", line 229, in main(args) File "main.py", line 182, in main evaluator.evaluate(test_loader, dataset=test_dataset, vis_dir=vis_dir) File "/aster/lib/evaluators.py", line 47, in evaluate output_dict = self._forward(input_dict) File "/aster/lib/evaluators.py", line 152, in _forward output_dict = self.model(input_dict) File "/opt/conda/envs/recognition/lib/python3.7/site-packages/torch/nn/modules/module.py", line 493, in call result = self.forward(*input, kwargs) File "/opt/conda/envs/recognition/lib/python3.7/site-packages/torch/nn/parallel/data_parallel.py", line 150, in forward return self.module(*inputs[0], kwargs[0]) File "/opt/conda/envs/recognition/lib/python3.7/site-packages/torch/nn/modules/module.py", line 493, in call* result = self.forward(input, kwargs) File "/aster/lib/models/model_builder.py", line 76, in forward stn_img_feat, ctrl_points = self.stn_head(stn_input) File "/opt/conda/envs/recognition/lib/python3.7/site-packages/torch/nn/modules/module.py", line 493, in call result = self.forward(*input, kwargs) File "/aster/lib/models/stn_head.py", line 89, in forward img_feat = self.stn_fc1(x) File "/opt/conda/envs/recognition/lib/python3.7/site-packages/torch/nn/modules/module.py", line 493, in call result = self.forward(*input, kwargs) File "/opt/conda/envs/recognition/lib/python3.7/site-packages/torch/nn/modules/container.py", line 92, in forward input = module(input) File "/opt/conda/envs/recognition/lib/python3.7/site-packages/torch/nn/modules/module.py", line 493, in call* result = self.forward(input, kwargs) File "/opt/conda/envs/recognition/lib/python3.7/site-packages/torch/nn/modules/linear.py", line 92, in forward return F.linear(input, self.weight, self.bias) File "/opt/conda/envs/recognition/lib/python3.7/site-packages/torch/nn/functional.py", line 1406, in linear ret = torch.addmm(bias, input, weight.t()) RuntimeError: cublas runtime error : the GPU program failed to execute at /opt/conda/conda-bld/pytorch_1556653215914/work/aten/src/THC/THCBlas.cu:259

修改了代码后,得到了bias, input, weight的形状,分别为 torch.Size([512]) torch.Size([867, 512]) torch.Size([512, 512]) 应为 Input X Weight转置 + bias 形状基本上都是对上的

比较糟糕的是这个cu文件貌似找不到,可能是临时的或者是编译pytorch过程中的文件 目前我使用的是9700K + titan rtx 是不是必须要换pytorch的版本?

已解决,是容器环境中使用了cuda10.1所致,重装pytorch和cuda toolkit就好了 要和环境中的cuda版本一致