facebookresearch / InterHand2.6M

Official PyTorch implementation of "InterHand2.6M: A Dataset and Baseline for 3D Interacting Hand Pose Estimation from a Single RGB Image", ECCV 2020
Other
676 stars 92 forks source link

train resnet18 with InterHand error #54

Closed murphypei closed 3 years ago

murphypei commented 3 years ago

I run test.py as readme successfully, but there are some errors when I try to train a resnet18 model.

I have already changed the resnet_type in main/config.py, I think there must be some configs still needs to be modified, but I couldn't find them. Can you help?

$ python train.py --gpu 0-3 --annot_subset human_annot                              
>>> Using GPU: 0,1,2,3
04-30 03:40:55 Creating train dataset...
Load annotation from  ../data/InterHand2.6M/annotations/human_annot
loading annotations into memory...
Done (t=10.21s)
creating index...
index created!
Get bbox and root depth from groundtruth annotation
Number of annotations in single hand sequences: 76445
Number of annotations in interacting hand sequences: 208271
04-30 03:42:11 Creating graph and optimizer...
Downloading: "https://download.pytorch.org/models/resnet18-5c106cde.pth" to /root/.cache/torch/hub/checkpoints/resnet18-5c106cde.pth
100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 44.7M/44.7M [00:00<00:00, 107MB/s]
Initialize resnet from model zoo
Traceback (most recent call last):
  File "train.py", line 90, in <module>
    main()
  File "train.py", line 60, in main
    loss = trainer.model(inputs, targets, meta_info, 'train')
  File "/usr/local/lib/python3.7/dist-packages/torch/nn/modules/module.py", line 727, in _call_impl
    result = self.forward(*input, **kwargs)
  File "/usr/local/lib/python3.7/dist-packages/torch/nn/parallel/data_parallel.py", line 161, in forward
    outputs = self.parallel_apply(replicas, inputs, kwargs)
  File "/usr/local/lib/python3.7/dist-packages/torch/nn/parallel/data_parallel.py", line 171, in parallel_apply
    return parallel_apply(replicas, inputs, kwargs, self.device_ids[:len(replicas)])
  File "/usr/local/lib/python3.7/dist-packages/torch/nn/parallel/parallel_apply.py", line 86, in parallel_apply
    output.reraise()
  File "/usr/local/lib/python3.7/dist-packages/torch/_utils.py", line 428, in reraise
    raise self.exc_type(msg)
RuntimeError: Caught RuntimeError in replica 0 on device 0.
Original Traceback (most recent call last):
  File "/usr/local/lib/python3.7/dist-packages/torch/nn/parallel/parallel_apply.py", line 61, in _worker
    output = module(*input, **kwargs)
  File "/usr/local/lib/python3.7/dist-packages/torch/nn/modules/module.py", line 727, in _call_impl
    result = self.forward(*input, **kwargs)
  File "/mnt/cephfs2/peichao/code/InterHand2.6M/main/model.py", line 45, in forward
    joint_heatmap_out, rel_root_depth_out, hand_type = self.pose_net(img_feat)
  File "/usr/local/lib/python3.7/dist-packages/torch/nn/modules/module.py", line 727, in _call_impl
    result = self.forward(*input, **kwargs)
  File "/mnt/cephfs2/peichao/code/InterHand2.6M/main/../common/nets/module.py", line 48, in forward
    joint_img_feat_1 = self.joint_deconv_1(img_feat)
  File "/usr/local/lib/python3.7/dist-packages/torch/nn/modules/module.py", line 727, in _call_impl
    result = self.forward(*input, **kwargs)
  File "/usr/local/lib/python3.7/dist-packages/torch/nn/modules/container.py", line 117, in forward
    input = module(input)
  File "/usr/local/lib/python3.7/dist-packages/torch/nn/modules/module.py", line 727, in _call_impl
    result = self.forward(*input, **kwargs)
  File "/usr/local/lib/python3.7/dist-packages/torch/nn/modules/conv.py", line 929, in forward
    output_padding, self.groups, self.dilation)
RuntimeError: Given transposed=1, weight of size [2048, 256, 4, 4], expected input[16, 512, 8, 8] to have 2048 channels, but got 512 channels instead
mks0601 commented 3 years ago

The error says the channel dimension does not match. You should change all 2048 in common/nets/module.py to 512. Maybe I should change the codes to automatically reflect the resnet type

murphypei commented 3 years ago

@mks0601 thank you, it works. Automatically reflect is better I think.