facebookresearch / InterHand2.6M

Official PyTorch implementation of "InterHand2.6M: A Dataset and Baseline for 3D Interacting Hand Pose Estimation from a Single RGB Image", ECCV 2020
Other
676 stars 92 forks source link

Cannot reproduce paper numbers following instructions #41

Closed zc-alexfan closed 3 years ago

zc-alexfan commented 3 years ago

Hi, I followed the instructions according to the repo to train and reproduce the InterHand performance. While the reported numbers for interacting hand pose validation error is 18.58mm (Table 4), my reproduced number is 20mm. Do you know why there is a discrepancy? I didn't modify anything from the repo to train this.

I saw a bug report earlier that the image sizes were swapped. Would that be the reason? Thanks.

Here are the command I used for training and validation (I guess I should use epoch 19 for testing as the number starts from 0):

python train.py --gpu 2 --annot_subset all
python test.py --gpu 0 --test_epoch 19 --test_set val --annot_subset machine_annot

I use the default config.py but do a batch size of 32 with accumulate gradient of 2, which should be equivalent to batch size 64.

Evaluation start...
Handedness accuracy: 0.9831676136363636
MRRPE: 35.12182411345029

MPJPE for each joint: 
r_thumb4: 21.78, r_thumb3: 16.52, r_thumb2: 12.93, r_thumb1: 8.33, r_index4: 25.84, r_index3: 21.78, r_index2: 18.68, r_index1: 14.23, r_middle4: 25.96, r_middle3: 22.06, r_middle2: 19.23, r_middle1: 14.23, r_ring4: 24.27, r_ring3: 20.41, r_ring2: 17.57, r_ring1: 13.15, r_pinky4: 22.74, r_pinky3: 19.37, r_pinky2: 17.12, r_pinky1: 12.72, r_wrist: 0.00, l_thumb4: 22.68, l_thumb3: 17.41, l_thumb2: 13.32, l_thumb1: 8.35, l_index4: 25.08, l_index3: 20.58, l_index2: 17.80, l_index1: 13.81, l_middle4: 25.59, l_middle3: 21.60, l_middle2: 18.89, l_middle1: 14.08, l_ring4: 23.69, l_ring3: 19.87, l_ring2: 17.31, l_ring1: 13.56, l_pinky4: 23.56, l_pinky3: 20.01, l_pinky2: 17.20, l_pinky1: 13.07, l_wrist: 0.00, 
MPJPE for all hand sequences: 17.53

MPJPE for each joint: 
r_thumb4: 17.88, r_thumb3: 13.76, r_thumb2: 10.29, r_thumb1: 6.97, r_index4: 19.81, r_index3: 17.42, r_index2: 15.27, r_index1: 12.04, r_middle4: 22.19, r_middle3: 19.61, r_middle2: 16.82, r_middle1: 11.89, r_ring4: 21.16, r_ring3: 18.37, r_ring2: 15.21, r_ring1: 10.80, r_pinky4: 19.80, r_pinky3: 17.02, r_pinky2: 14.57, r_pinky1: 9.99, r_wrist: 0.00, l_thumb4: 19.56, l_thumb3: 15.51, l_thumb2: 11.46, l_thumb1: 7.24, l_index4: 19.82, l_index3: 16.34, l_index2: 14.42, l_index1: 11.36, l_middle4: 21.45, l_middle3: 18.39, l_middle2: 15.78, l_middle1: 11.76, l_ring4: 20.48, l_ring3: 17.29, l_ring2: 14.55, l_ring1: 11.26, l_pinky4: 20.50, l_pinky3: 17.41, l_pinky2: 14.73, l_pinky1: 10.81, l_wrist: 0.00, 
MPJPE for single hand sequences: 14.79

MPJPE for each joint: 
r_thumb4: 25.54, r_thumb3: 19.17, r_thumb2: 15.46, r_thumb1: 10.48, r_index4: 32.22, r_index3: 26.37, r_index2: 22.16, r_index1: 16.37, r_middle4: 30.13, r_middle3: 24.67, r_middle2: 21.72, r_middle1: 16.50, r_ring4: 27.56, r_ring3: 22.58, r_ring2: 19.97, r_ring1: 15.41, r_pinky4: 25.83, r_pinky3: 21.77, r_pinky2: 19.64, r_pinky1: 15.33, r_wrist: 0.00, l_thumb4: 26.10, l_thumb3: 19.45, l_thumb2: 15.32, l_thumb1: 10.11, l_index4: 31.30, l_index3: 25.39, l_index2: 21.47, l_index1: 16.41, l_middle4: 31.19, l_middle3: 25.41, l_middle2: 22.33, l_middle1: 16.54, l_ring4: 28.02, l_ring3: 22.87, l_ring2: 20.31, l_ring1: 16.01, l_pinky4: 27.30, l_pinky3: 22.88, l_pinky2: 19.85, l_pinky1: 15.46, l_wrist: 0.00, 
MPJPE for interacting hand sequences: 20.54
mks0601 commented 3 years ago

Q1. I use the default config.py but do a batch size of 32 with accumulate gradient of 2, which should be equivalent to batch size 64. <- Could you explain more details of this?

Q2. I saw a bug report earlier that the image sizes were swapped. <- when did you download data and annotations files?

zc-alexfan commented 3 years ago

Q1. Here is the diff of my accumulated gradient code to your code:

 from mpl_toolkits.mplot3d import Axes3D
 import matplotlib.pyplot as plt
 import matplotlib as mpl
diff --git a/main/config.py b/main/config.py
index 6c1f530..1322c5a 100644
--- a/main/config.py
+++ b/main/config.py
@@ -32,7 +32,7 @@ class Config:
     end_epoch = 20 if dataset == 'InterHand2.6M' else 50
     lr = 1e-4
     lr_dec_factor = 10
-    train_batch_size = 16
+    train_batch_size = 32

     ## testing config
     test_batch_size = 32
@@ -49,7 +49,7 @@ class Config:
     result_dir = osp.join(output_dir, 'result')

     ## others
-    num_thread = 40
+    num_thread = 8
     gpu_ids = '0'
     num_gpus = 1
     continue_train = False
diff --git a/main/train.py b/main/train.py
index 1036fef..0f3e1f9 100644
--- a/main/train.py
+++ b/main/train.py
@@ -44,7 +44,8 @@ def main():
     trainer = Trainer()
     trainer._make_batch_generator(args.annot_subset)
     trainer._make_model()
-    
+    optim_step = False
+
     # train
     for epoch in range(trainer.start_epoch, cfg.end_epoch):

@@ -56,13 +57,16 @@ def main():
             trainer.gpu_timer.tic()

             # forward
-            trainer.optimizer.zero_grad()
             loss = trainer.model(inputs, targets, meta_info, 'train')
             loss = {k:loss[k].mean() for k in loss}

             # backward
-            sum(loss[k] for k in loss).backward()
-            trainer.optimizer.step()
+            my_loss = sum(loss[k] for k in loss)/2
+            my_loss.backward()
+            if optim_step:
+                trainer.optimizer.step()
+                trainer.optimizer.zero_grad()
+            optim_step = not optim_step
             trainer.gpu_timer.toc()
             screen = [
                 'Epoch %d/%d itr %d/%d:' % (epoch, cfg.end_epoch, itr, trainer.itr_per_epoch),

Q2. I downloaded the dataset in Sep. 11, 2020.

I am pretty sure the accumulated gradient update is correct as I follow the instruction in here. I also have an accumulated gradient version of the code in pytorch lightning, which doesn't have any problem so far.

Q3. Have you verified your code can reproduce by training a new model using the default setting? Just want to check.

zc-alexfan commented 3 years ago

I did a bit of follow-up using your latest version of code and dataset:

The evaluation result of the H+M trained model is shown below, which does not show 18mm for interacting hands.

Evaluation start...
Handedness accuracy: 0.9835464015151515
MRRPE: 35.98906321887903

MPJPE for each joint: 
r_thumb4: 22.01, r_thumb3: 16.54, r_thumb2: 13.14, r_thumb1: 8.44, r_index4: 26.14, r_index3: 21.99, r_index2: 18.93, r_index1: 14.52, r_middle4: 26.37, r_middle3: 22.13, r_middle2: 19.53, r_middle1: 14.59, r_ring4: 24.76, r_ring3: 20.68, r_ring2: 17.99, r_ring1: 13.56, r_pinky4: 23.39, r_pinky3: 19.80, r_pinky2: 17.62, r_pinky1: 13.00, r_wrist: 0.00, l_thumb4: 22.30, l_thumb3: 17.06, l_thumb2: 13.23, l_thumb1: 8.32, l_index4: 24.67, l_index3: 20.17, l_index2: 17.47, l_index1: 13.67, l_middle4: 25.32, l_middle3: 21.17, l_middle2: 18.63, l_middle1: 13.84, l_ring4: 23.49, l_ring3: 19.67, l_ring2: 17.09, l_ring1: 13.36, l_pinky4: 23.58, l_pinky3: 19.85, l_pinky2: 17.23, l_pinky1: 12.98, l_wrist: 0.00, 
MPJPE for all hand sequences: 17.58

MPJPE for each joint: 
r_thumb4: 17.84, r_thumb3: 13.74, r_thumb2: 10.40, r_thumb1: 7.13, r_index4: 20.23, r_index3: 17.68, r_index2: 15.64, r_index1: 12.45, r_middle4: 22.57, r_middle3: 19.79, r_middle2: 17.17, r_middle1: 12.42, r_ring4: 21.58, r_ring3: 18.67, r_ring2: 15.69, r_ring1: 11.26, r_pinky4: 20.50, r_pinky3: 17.57, r_pinky2: 15.19, r_pinky1: 10.30, r_wrist: 0.00, l_thumb4: 18.94, l_thumb3: 14.95, l_thumb2: 11.09, l_thumb1: 7.13, l_index4: 19.73, l_index3: 16.11, l_index2: 14.18, l_index1: 11.16, l_middle4: 20.99, l_middle3: 17.85, l_middle2: 15.59, l_middle1: 11.46, l_ring4: 20.14, l_ring3: 16.91, l_ring2: 14.31, l_ring1: 11.02, l_pinky4: 20.50, l_pinky3: 17.24, l_pinky2: 14.70, l_pinky1: 10.63, l_wrist: 0.00, 
MPJPE for single hand sequences: 14.82

MPJPE for each joint: 
r_thumb4: 26.03, r_thumb3: 19.26, r_thumb2: 15.78, r_thumb1: 10.51, r_index4: 32.39, r_index3: 26.53, r_index2: 22.29, r_index1: 16.52, r_middle4: 30.59, r_middle3: 24.61, r_middle2: 21.96, r_middle1: 16.70, r_ring4: 28.12, r_ring3: 22.81, r_ring2: 20.32, r_ring1: 15.77, r_pinky4: 26.43, r_pinky3: 22.09, r_pinky2: 20.02, r_pinky1: 15.58, r_wrist: 0.00, l_thumb4: 25.99, l_thumb3: 19.31, l_thumb2: 15.53, l_thumb1: 10.19, l_index4: 30.52, l_index3: 24.78, l_index2: 21.05, l_index1: 16.34, l_middle4: 31.16, l_middle3: 25.11, l_middle2: 21.99, l_middle1: 16.37, l_ring4: 28.01, l_ring3: 22.88, l_ring2: 20.11, l_ring1: 15.85, l_pinky4: 27.34, l_pinky3: 22.75, l_pinky2: 19.92, l_pinky1: 15.45, l_wrist: 0.00, 
MPJPE for interacting hand sequences: 20.59
mks0601 commented 3 years ago

Oh I see. Sorry, but there is a testing result from InterNet trained on InterHand2.6M v0.0. The current released version of InterHand2.6M (v0.0) is not a full InterHand2.6M, as described in here. The testing results on InterHand2.6M v0.0 seems similar with your result. Hope this can resolve your question.

mks0601 commented 3 years ago

Btw, the gradient accumulation you described above is applicable to adam optimizer? I think this can be applicable to SGD optimizer, but not sure about adam optimizer.

zc-alexfan commented 3 years ago

Thanks for the quick response despite it is ICCV time. Yes. That matches the numbers, but your paper said "All reported frame numbers and experimental results in the paper are from the 5 fps configuration."

But the IH error for 5 FPS is 20.59mm according to the "testing result" zip file while the number on the paper is 18.58mm.

I assume v0.0 means 5FPS.

mks0601 commented 3 years ago

v0.0 is not 5 fps. Let me clarify this.

Full IH2.6M (not released yet because of the data inspection)

IH2.6M v0.0 (released)

All numbers in the paper are from full IH2.6M, which is not released because of the data inspection. Therefore, I additionally provided the training and testing result on v0.0, which is released.

zc-alexfan commented 3 years ago

I see. That makes sense. For the 30 FPS version with v0.0, only the annotation is released right? My current understanding for the images is that they are from 5 FPS.

mks0601 commented 3 years ago

For the 30 FPS version with v0.0, only the annotation is released right? -> Correct.

quangdaist01 commented 2 years ago

I'm going to train the whole InterNet dataset using Colab Pro. Does anyone here have any estimation on how long it would take to complete one epoch?