dqj5182 / CONTHO_RELEASE

[CVPR 2024] This repo is official PyTorch implementation of Joint Reconstruction of 3D Human and Object via Contact-Based Refinement Transformer.
Other
63 stars 2 forks source link

Batch_size #10

Open faithbotbbot opened 2 days ago

faithbotbbot commented 2 days ago

Thanks for releasing the amazing work! I still can't get the results in the paper under epoch50. I have switched to pytorch 1.10. I would like to ask if I need to adjust the batchsize to 16*2=32 if I use a single gpu for training. In addition, I found that adjusting the batchsize will greatly affect the performance of the model. Have you tried it? image

dqj5182 commented 2 days ago

As far as my experience, batch size did not seem to affect much for the model performance and default batch size with a single gpu for training should work to get the numbers reported in our paper. May I ask for specific numbers of performance differences regarding different batch sizes?

According to the one you have attached as a screen capture, the model performance does not look too unreasonable. But I am concerned that you are not reaching the numbers reproduced in this issue. From other issues, the training performance seemed to relatively differ much for different environmental setting. May I ask which python & cuda version you have used so that I can reproduce your result? Lower version of Python may also affect training process.

faithbotbbot commented 1 day ago

batchsize 128 08-20 03:56:38 Work on GPU: 0,1 08-20 03:56:38 Args: Namespace(resume_training=False, gpu='0,1', dataset='behave', exp='', checkpoint='') 08-20 03:56:38 Cfg: {'cur_dir': '/wangzr/CONTHO_RELEASE/lib/core', 'root_dir': '/wangzr/CONTHO_RELEASE/lib/core/../../', 'data_dir': '/wangzr/CONTHO_RELEASE/lib/core/../../data', 'DATASET': {'name': 'BEHAVE', 'workers': 4, 'random_seed': 123, 'bbox_expand_ratio': 1.3, 'obj_set': 'behave'}, 'MODEL': {'input_img_shape': [512, 512], 'input_body_shape': [256, 256], 'input_hand_shape': [256, 256], 'img_feat_shape': [8, 8, 8], 'weight_path': ''}, 'TRAIN': {'batch_size': 128, 'shuffle': True, 'begin_epoch': 1, 'end_epoch': 50, 'warmup_epoch': 3, 'scheduler': 'step', 'lr': 0.0001, 'min_lr': 1e-06, 'lr_step': [30], 'lr_factor': 0.1, 'optimizer': 'adam', 'momentum': 0, 'weight_decay': 0, 'beta1': 0.5, 'beta2': 0.999, 'print_freq': 10, 'loss_names': ['contact', 'vert', 'edge', 'param', 'coord', 'hand_bbox'], 'contact_loss_weight': 1.0, 'smpl_vert_loss_weight': 1.0, 'obj_vert_loss_weight': 1.0, 'smpl_edge_loss_weight': 1.0, 'smpl_pose_loss_weight': 1.0, 'smpl_shape_loss_weight': 1.0, 'obj_pose_loss_weight': 1.0, 'obj_trans_loss_weight': 1.0, 'smpl_3dkp_loss_weight': 1.0, 'smpl_2dkp_loss_weight': 1.0, 'pos_2dkp_loss_weight': 1.0, 'hand_bbox_loss_weight': 1.0}, 'AUG': {'scale_factor': 0.2, 'rot_factor': 30, 'shift_factor': 0, 'color_factor': 0.2, 'blur_factor': 0, 'flip': False}, 'TEST': {'batch_size': 32, 'shuffle': False, 'do_eval': True, 'eval_metrics': ['contact_est_p', 'contact_est_r', 'cd_human', 'cd_object', 'contact_rec_p', 'contact_rec_r'], 'print_freq': 10, 'contact_thres': 0.05}, 'CAMERA': {'focal': [1000, 1000], 'princpt': [256.0, 256.0], 'depth_factor': 4.4, 'obj_depth_factor': 4.4}, 'output_dir': '/wangzr/CONTHO_RELEASE/lib/core/../../experiment/exp_08-20_12:56:38', 'graph_dir': '/wangzr/CONTHO_RELEASE/lib/core/../../experiment/exp_08-20_12:56:38/graph', 'vis_dir': '/wangzr/CONTHO_RELEASE/lib/core/../../experiment/exp_08-20_12:56:38/vis', 'res_dir': '/wangzr/CONTHO_RELEASE/lib/core/../../experiment/exp_08-20_12:56:38/results', 'log_dir': '/wangzr/CONTHO_RELEASE/lib/core/../../experiment/exp_08-20_12:56:38/log', 'checkpoint_dir': '/wangzr/CONTHO_RELEASE/lib/core/../../experiment/exp_08-20_12:56:38/checkpoints'} 08-20 03:56:47 # of model parameters: 82802022 08-20 03:56:47 ==> Preparing TRAIN Dataloader... 08-20 03:57:10 # of TRAIN BEHAVE data: 45380 08-20 03:57:11 # of model parameters: 82802022 08-20 03:57:11 ==> Preparing TEST Dataloader... 08-20 03:57:15 # of TEST BEHAVE data: 4129 08-20 03:57:15 ===> Start training... 08-20 04:12:43 Epoch1 Loss: total: 0.0211 contact: 0.0040 vert: 0.0029 edge: 0.0001 param: 0.0110 coord: 0.0027 hand_bbox: 0.0004 08-20 04:22:17 Finished Evaluation! -------- Evaluation Results (Contact estimation) ------ Precision: 0.402 / Recall: 0.000 ---------- Evaluation Results (Reconstruction) --------

Chamfer Distance Human: 11.97 / Object: 25.23 Contact from reconstruction Precision: 0.268 / Recall: 0.204 08-20 04:36:17 Epoch2 Loss: total: 0.0160 contact: 0.0035 vert: 0.0020 edge: 0.0000 param: 0.0080 coord: 0.0022 hand_bbox: 0.0003 08-20 04:48:48 Epoch3 Loss: total: 0.0145 contact: 0.0033 vert: 0.0018 edge: 0.0000 param: 0.0070 coord: 0.0021 hand_bbox: 0.0002 08-20 05:01:21 Epoch4 Loss: total: 0.0134 contact: 0.0031 vert: 0.0016 edge: 0.0000 param: 0.0063 coord: 0.0020 hand_bbox: 0.0002 08-20 05:14:44 Epoch5 Loss: total: 0.0125 contact: 0.0030 vert: 0.0016 edge: 0.0000 param: 0.0057 coord: 0.0020 hand_bbox: 0.0002 08-20 05:27:10 Epoch6 Loss: total: 0.0118 contact: 0.0030 vert: 0.0015 edge: 0.0000 param: 0.0051 coord: 0.0019 hand_bbox: 0.0002 08-20 05:41:29 Epoch7 Loss: total: 0.0113 contact: 0.0029 vert: 0.0015 edge: 0.0000 param: 0.0048 coord: 0.0019 hand_bbox: 0.0002 08-20 05:55:37 Epoch8 Loss: total: 0.0110 contact: 0.0028 vert: 0.0014 edge: 0.0000 param: 0.0047 coord: 0.0019 hand_bbox: 0.0002 08-20 06:10:03 Epoch9 Loss: total: 0.0108 contact: 0.0028 vert: 0.0014 edge: 0.0000 param: 0.0045 coord: 0.0019 hand_bbox: 0.0002 08-20 06:24:17 Epoch10 Loss: total: 0.0105 contact: 0.0027 vert: 0.0013 edge: 0.0000 param: 0.0044 coord: 0.0019 hand_bbox: 0.0002 08-20 06:38:38 Epoch11 Loss: total: 0.0103 contact: 0.0027 vert: 0.0013 edge: 0.0000 param: 0.0042 coord: 0.0019 hand_bbox: 0.0002 08-20 06:47:25 Finished Evaluation! -------- Evaluation Results (Contact estimation) ------ Precision: 0.733 / Recall: 0.381 ---------- Evaluation Results (Reconstruction) -------- Chamfer Distance Human: 8.02 / Object: 13.46 Contact from reconstruction Precision: 0.465 / Recall: 0.365 08-20 07:01:47 Epoch12 Loss: total: 0.0101 contact: 0.0027 vert: 0.0013 edge: 0.0000 param: 0.0041 coord: 0.0018 hand_bbox: 0.0002 08-20 07:16:02 Epoch13 Loss: total: 0.0099 contact: 0.0026 vert: 0.0012 edge: 0.0000 param: 0.0040 coord: 0.0018 hand_bbox: 0.0002 08-20 07:30:14 Epoch14 Loss: total: 0.0098 contact: 0.0026 vert: 0.0012 edge: 0.0000 param: 0.0040 coord: 0.0018 hand_bbox: 0.0002 08-20 07:44:32 Epoch15 Loss: total: 0.0097 contact: 0.0026 vert: 0.0012 edge: 0.0000 param: 0.0039 coord: 0.0018 hand_bbox: 0.0002 08-20 07:58:41 Epoch16 Loss: total: 0.0096 contact: 0.0026 vert: 0.0012 edge: 0.0000 param: 0.0038 coord: 0.0018 hand_bbox: 0.0002 08-20 08:12:41 Epoch17 Loss: total: 0.0094 contact: 0.0025 vert: 0.0012 edge: 0.0000 param: 0.0037 coord: 0.0018 hand_bbox: 0.0002 08-20 08:26:53 Epoch18 Loss: total: 0.0093 contact: 0.0025 vert: 0.0011 edge: 0.0000 param: 0.0037 coord: 0.0018 hand_bbox: 0.0002 08-20 08:40:59 Epoch19 Loss: total: 0.0092 contact: 0.0025 vert: 0.0011 edge: 0.0000 param: 0.0036 coord: 0.0018 hand_bbox: 0.0002 08-20 08:55:06 Epoch20 Loss: total: 0.0091 contact: 0.0025 vert: 0.0011 edge: 0.0000 param: 0.0036 coord: 0.0018 hand_bbox: 0.0002 08-20 09:09:12 Epoch21 Loss: total: 0.0090 contact: 0.0024 vert: 0.0011 edge: 0.0000 param: 0.0035 coord: 0.0018 hand_bbox: 0.0001 08-20 09:17:54 Finished Evaluation! -------- Evaluation Results (Contact estimation) ------ Precision: 0.765 / Recall: 0.395 ---------- Evaluation Results (Reconstruction) -------- Chamfer Distance Human: 7.36 / Object: 12.55 Contact from reconstruction Precision: 0.522 / Recall: 0.349 08-20 09:31:49 Epoch22 Loss: total: 0.0089 contact: 0.0024 vert: 0.0011 edge: 0.0000 param: 0.0035 coord: 0.0018 hand_bbox: 0.0001 08-20 09:45:42 Epoch23 Loss: total: 0.0089 contact: 0.0024 vert: 0.0011 edge: 0.0000 param: 0.0034 coord: 0.0018 hand_bbox: 0.0001 08-20 09:59:41 Epoch24 Loss: total: 0.0088 contact: 0.0024 vert: 0.0010 edge: 0.0000 param: 0.0034 coord: 0.0018 hand_bbox: 0.0001 08-20 10:13:39 Epoch25 Loss: total: 0.0087 contact: 0.0024 vert: 0.0010 edge: 0.0000 param: 0.0034 coord: 0.0018 hand_bbox: 0.0001 08-20 10:27:16 Epoch26 Loss: total: 0.0086 contact: 0.0023 vert: 0.0010 edge: 0.0000 param: 0.0033 coord: 0.0018 hand_bbox: 0.0001 08-20 10:41:06 Epoch27 Loss: total: 0.0085 contact: 0.0023 vert: 0.0010 edge: 0.0000 param: 0.0033 coord: 0.0018 hand_bbox: 0.0001 08-20 10:55:12 Epoch28 Loss: total: 0.0085 contact: 0.0023 vert: 0.0010 edge: 0.0000 param: 0.0033 coord: 0.0018 hand_bbox: 0.0001 08-20 11:09:20 Epoch29 Loss: total: 0.0084 contact: 0.0023 vert: 0.0010 edge: 0.0000 param: 0.0032 coord: 0.0018 hand_bbox: 0.0001 08-20 11:23:34 Epoch30 Loss: total: 0.0083 contact: 0.0023 vert: 0.0010 edge: 0.0000 param: 0.0032 coord: 0.0018 hand_bbox: 0.0001 08-20 11:37:29 Epoch31 Loss: total: 0.0083 contact: 0.0022 vert: 0.0009 edge: 0.0000 param: 0.0032 coord: 0.0017 hand_bbox: 0.0001 08-20 11:45:58 Finished Evaluation! -------- Evaluation Results (Contact estimation) ------ Precision: 0.760 / Recall: 0.463 ---------- Evaluation Results (Reconstruction) -------- Chamfer Distance Human: 6.79 / Object: 11.52 Contact from reconstruction Precision: 0.541 / Recall: 0.378 08-20 11:59:57 Epoch32 Loss: total: 0.0082 contact: 0.0022 vert: 0.0009 edge: 0.0000 param: 0.0031 coord: 0.0017 hand_bbox: 0.0001 08-20 12:13:47 Epoch33 Loss: total: 0.0081 contact: 0.0022 vert: 0.0009 edge: 0.0000 param: 0.0031 coord: 0.0017 hand_bbox: 0.0001 08-20 12:27:44 Epoch34 Loss: total: 0.0081 contact: 0.0022 vert: 0.0009 edge: 0.0000 param: 0.0031 coord: 0.0017 hand_bbox: 0.0001 08-20 12:41:38 Epoch35 Loss: total: 0.0080 contact: 0.0022 vert: 0.0009 edge: 0.0000 param: 0.0030 coord: 0.0017 hand_bbox: 0.0001 08-20 12:55:38 Epoch36 Loss: total: 0.0080 contact: 0.0022 vert: 0.0009 edge: 0.0000 param: 0.0030 coord: 0.0017 hand_bbox: 0.0001 08-20 13:09:44 Epoch37 Loss: total: 0.0079 contact: 0.0021 vert: 0.0009 edge: 0.0000 param: 0.0030 coord: 0.0017 hand_bbox: 0.0001 08-20 13:24:02 Epoch38 Loss: total: 0.0078 contact: 0.0021 vert: 0.0009 edge: 0.0000 param: 0.0030 coord: 0.0017 hand_bbox: 0.0001 08-20 13:38:06 Epoch39 Loss: total: 0.0078 contact: 0.0021 vert: 0.0009 edge: 0.0000 param: 0.0029 coord: 0.0017 hand_bbox: 0.0001 08-20 13:51:59 Epoch40 Loss: total: 0.0077 contact: 0.0021 vert: 0.0008 edge: 0.0000 param: 0.0029 coord: 0.0017 hand_bbox: 0.0001 08-20 14:06:03 Epoch41 Loss: total: 0.0077 contact: 0.0021 vert: 0.0008 edge: 0.0000 param: 0.0029 coord: 0.0017 hand_bbox: 0.0001 08-20 14:14:26 Finished Evaluation! -------- Evaluation Results (Contact estimation) ------ Precision: 0.728 / Recall: 0.576 ---------- Evaluation Results (Reconstruction) -------- Chamfer Distance Human: 6.16 / Object: 10.48 Contact from reconstruction Precision: 0.548 / Recall: 0.440 08-20 14:28:45 Epoch42 Loss: total: 0.0077 contact: 0.0021 vert: 0.0008 edge: 0.0000 param: 0.0029 coord: 0.0017 hand_bbox: 0.0001 08-20 14:42:59 Epoch43 Loss: total: 0.0076 contact: 0.0021 vert: 0.0008 edge: 0.0000 param: 0.0029 coord: 0.0017 hand_bbox: 0.0001 08-20 14:57:11 Epoch44 Loss: total: 0.0076 contact: 0.0020 vert: 0.0008 edge: 0.0000 param: 0.0028 coord: 0.0017 hand_bbox: 0.0001 08-20 15:11:11 Epoch45 Loss: total: 0.0075 contact: 0.0020 vert: 0.0008 edge: 0.0000 param: 0.0028 coord: 0.0017 hand_bbox: 0.0001 08-20 15:25:21 Epoch46 Loss: total: 0.0075 contact: 0.0020 vert: 0.0008 edge: 0.0000 param: 0.0028 coord: 0.0017 hand_bbox: 0.0001 08-20 15:39:10 Epoch47 Loss: total: 0.0074 contact: 0.0020 vert: 0.0008 edge: 0.0000 param: 0.0028 coord: 0.0017 hand_bbox: 0.0001 08-20 15:53:16 Epoch48 Loss: total: 0.0074 contact: 0.0020 vert: 0.0008 edge: 0.0000 param: 0.0027 coord: 0.0017 hand_bbox: 0.0001 08-20 16:07:11 Epoch49 Loss: total: 0.0074 contact: 0.0020 vert: 0.0008 edge: 0.0000 param: 0.0027 coord: 0.0017 hand_bbox: 0.0001 08-20 16:21:18 Epoch50 Loss: total: 0.0073 contact: 0.0020 vert: 0.0008 edge: 0.0000 param: 0.0027 coord: 0.0017 hand_bbox: 0.0001 08-20 16:29:38 Finished Evaluation! -------- Evaluation Results (Contact estimation) ------ Precision: 0.754 / Recall: 0.511 ---------- Evaluation Results (Reconstruction) -------- Chamfer Distance Human: 5.97 / Object: 10.16 Contact from reconstruction Precision: 0.560 / Recall: 0.434

bach size 16 09-14 13:51:27 Work on GPU: 0,1 09-14 13:51:27 Args: Namespace(resume_training=False, gpu='0,1', dataset='behave', exp='bs16base', checkpoint='') 09-14 13:51:27 Cfg: {'cur_dir': '/wangzr/CONTHO_RELEASE/lib/core', 'root_dir': '/wangzr/CONTHO_RELEASE/lib/core/../../', 'data_dir': '/wangzr/CONTHO_RELEASE/lib/core/../../data', 'DATASET': {'name': 'BEHAVE', 'workers': 4, 'random_seed': 123, 'bbox_expand_ratio': 1.3, 'obj_set': 'behave'}, 'MODEL': {'input_img_shape': [512, 512], 'input_body_shape': [256, 256], 'input_hand_shape': [256, 256], 'img_feat_shape': [8, 8, 8], 'weight_path': ''}, 'TRAIN': {'batch_size': 16, 'shuffle': True, 'begin_epoch': 1, 'end_epoch': 50, 'warmup_epoch': 3, 'scheduler': 'step', 'lr': 0.0001, 'min_lr': 1e-06, 'lr_step': [30], 'lr_factor': 0.1, 'optimizer': 'adam', 'momentum': 0, 'weight_decay': 0, 'beta1': 0.5, 'beta2': 0.999, 'print_freq': 10, 'loss_names': ['contact', 'vert', 'edge', 'param', 'coord', 'hand_bbox'], 'contact_loss_weight': 1.0, 'smpl_vert_loss_weight': 1.0, 'obj_vert_loss_weight': 1.0, 'smpl_edge_loss_weight': 1.0, 'smpl_pose_loss_weight': 1.0, 'smpl_shape_loss_weight': 1.0, 'obj_pose_loss_weight': 1.0, 'obj_trans_loss_weight': 1.0, 'smpl_3dkp_loss_weight': 1.0, 'smpl_2dkp_loss_weight': 1.0, 'pos_2dkp_loss_weight': 1.0, 'hand_bbox_loss_weight': 1.0}, 'AUG': {'scale_factor': 0.2, 'rot_factor': 30, 'shift_factor': 0, 'color_factor': 0.2, 'blur_factor': 0, 'flip': False}, 'TEST': {'batch_size': 32, 'shuffle': False, 'do_eval': True, 'eval_metrics': ['contact_est_p', 'contact_est_r', 'cd_human', 'cd_object', 'contact_rec_p', 'contact_rec_r'], 'print_freq': 10, 'contact_thres': 0.05}, 'CAMERA': {'focal': [1000, 1000], 'princpt': [256.0, 256.0], 'depth_factor': 4.4, 'obj_depth_factor': 4.4}, 'output_dir': '/wangzr/CONTHO_RELEASE/lib/core/../../experiment/bs16base', 'graph_dir': '/wangzr/CONTHO_RELEASE/lib/core/../../experiment/bs16base/graph', 'vis_dir': '/wangzr/CONTHO_RELEASE/lib/core/../../experiment/bs16base/vis', 'res_dir': '/wangzr/CONTHO_RELEASE/lib/core/../../experiment/bs16base/results', 'log_dir': '/wangzr/CONTHO_RELEASE/lib/core/../../experiment/bs16base/log', 'checkpoint_dir': '/wangzr/CONTHO_RELEASE/lib/core/../../experiment/bs16base/checkpoints'} 09-14 13:51:41 # of model parameters: 82802022 09-14 13:51:41 ==> Preparing TRAIN Dataloader... 09-14 13:52:00 # of TRAIN BEHAVE data: 45380 09-14 13:52:01 # of model parameters: 82802022 09-14 13:52:01 ==> Preparing TEST Dataloader... 09-14 13:52:05 # of TEST BEHAVE data: 4129 09-14 13:52:05 ===> Start training... 09-14 14:11:50 Epoch1 Loss: total: 0.1360 contact: 0.0296 vert: 0.0181 edge: 0.0003 param: 0.0673 coord: 0.0182 hand_bbox: 0.0025 09-14 14:21:39 Finished Evaluation! -------- Evaluation Results (Contact estimation) ------ Precision: 0.650 / Recall: 0.066 ---------- Evaluation Results (Reconstruction) --------

Chamfer Distance Human: 9.61 / Object: 17.75 Contact from reconstruction Precision: 0.384 / Recall: 0.256 09-14 14:41:28 Epoch2 Loss: total: 0.1028 contact: 0.0253 vert: 0.0128 edge: 0.0003 param: 0.0469 coord: 0.0158 hand_bbox: 0.0017 09-14 15:00:58 Epoch3 Loss: total: 0.0920 contact: 0.0238 vert: 0.0116 edge: 0.0003 param: 0.0394 coord: 0.0153 hand_bbox: 0.0015 09-14 15:20:26 Epoch4 Loss: total: 0.0868 contact: 0.0230 vert: 0.0109 edge: 0.0003 param: 0.0361 coord: 0.0151 hand_bbox: 0.0014 09-14 15:39:48 Epoch5 Loss: total: 0.0829 contact: 0.0223 vert: 0.0103 edge: 0.0003 param: 0.0338 coord: 0.0149 hand_bbox: 0.0014 09-14 15:59:18 Epoch6 Loss: total: 0.0797 contact: 0.0217 vert: 0.0097 edge: 0.0003 param: 0.0320 coord: 0.0147 hand_bbox: 0.0013 09-14 16:18:48 Epoch7 Loss: total: 0.0774 contact: 0.0212 vert: 0.0093 edge: 0.0003 param: 0.0306 coord: 0.0147 hand_bbox: 0.0013 09-14 16:38:20 Epoch8 Loss: total: 0.0754 contact: 0.0207 vert: 0.0090 edge: 0.0003 param: 0.0296 coord: 0.0146 hand_bbox: 0.0013 09-14 16:57:47 Epoch9 Loss: total: 0.0737 contact: 0.0204 vert: 0.0086 edge: 0.0003 param: 0.0286 coord: 0.0145 hand_bbox: 0.0012 09-14 17:17:46 Epoch10 Loss: total: 0.0721 contact: 0.0200 vert: 0.0083 edge: 0.0003 param: 0.0278 coord: 0.0144 hand_bbox: 0.0012 09-14 17:37:17 Epoch11 Loss: total: 0.0708 contact: 0.0197 vert: 0.0081 edge: 0.0003 param: 0.0271 coord: 0.0144 hand_bbox: 0.0012 09-14 17:46:31 Finished Evaluation! -------- Evaluation Results (Contact estimation) ------ Precision: 0.763 / Recall: 0.467 ---------- Evaluation Results (Reconstruction) -------- Chamfer Distance Human: 6.30 / Object: 11.34 Contact from reconstruction Precision: 0.569 / Recall: 0.356 09-14 18:06:32 Epoch12 Loss: total: 0.0696 contact: 0.0194 vert: 0.0079 edge: 0.0003 param: 0.0265 coord: 0.0143 hand_bbox: 0.0012 09-14 18:26:10 Epoch13 Loss: total: 0.0684 contact: 0.0192 vert: 0.0077 edge: 0.0003 param: 0.0259 coord: 0.0143 hand_bbox: 0.0011 09-14 18:46:22 Epoch14 Loss: total: 0.0674 contact: 0.0189 vert: 0.0075 edge: 0.0003 param: 0.0254 coord: 0.0143 hand_bbox: 0.0011 09-14 19:06:45 Epoch15 Loss: total: 0.0664 contact: 0.0186 vert: 0.0073 edge: 0.0003 param: 0.0248 coord: 0.0142 hand_bbox: 0.0011 09-14 19:26:47 Epoch16 Loss: total: 0.0654 contact: 0.0184 vert: 0.0072 edge: 0.0003 param: 0.0243 coord: 0.0142 hand_bbox: 0.0011 09-14 19:46:56 Epoch17 Loss: total: 0.0645 contact: 0.0181 vert: 0.0070 edge: 0.0002 param: 0.0239 coord: 0.0141 hand_bbox: 0.0011 09-14 20:06:14 Epoch18 Loss: total: 0.0638 contact: 0.0179 vert: 0.0069 edge: 0.0002 param: 0.0235 coord: 0.0141 hand_bbox: 0.0011 09-14 20:26:08 Epoch19 Loss: total: 0.0630 contact: 0.0178 vert: 0.0068 edge: 0.0002 param: 0.0231 coord: 0.0141 hand_bbox: 0.0011 09-14 20:46:47 Epoch20 Loss: total: 0.0623 contact: 0.0176 vert: 0.0067 edge: 0.0002 param: 0.0228 coord: 0.0141 hand_bbox: 0.0011 09-14 21:06:55 Epoch21 Loss: total: 0.0616 contact: 0.0173 vert: 0.0066 edge: 0.0002 param: 0.0224 coord: 0.0140 hand_bbox: 0.0010 09-14 21:15:52 Finished Evaluation! -------- Evaluation Results (Contact estimation) ------ Precision: 0.764 / Recall: 0.469 ---------- Evaluation Results (Reconstruction) -------- Chamfer Distance Human: 5.82 / Object: 9.95 Contact from reconstruction Precision: 0.574 / Recall: 0.425 09-14 21:35:38 Epoch22 Loss: total: 0.0610 contact: 0.0172 vert: 0.0065 edge: 0.0002 param: 0.0221 coord: 0.0140 hand_bbox: 0.0010 09-14 21:55:07 Epoch23 Loss: total: 0.0604 contact: 0.0170 vert: 0.0064 edge: 0.0002 param: 0.0218 coord: 0.0140 hand_bbox: 0.0010 09-14 22:14:30 Epoch24 Loss: total: 0.0599 contact: 0.0169 vert: 0.0063 edge: 0.0002 param: 0.0215 coord: 0.0140 hand_bbox: 0.0010 09-14 22:33:53 Epoch25 Loss: total: 0.0593 contact: 0.0167 vert: 0.0062 edge: 0.0002 param: 0.0212 coord: 0.0139 hand_bbox: 0.0010 09-14 22:53:18 Epoch26 Loss: total: 0.0588 contact: 0.0166 vert: 0.0061 edge: 0.0002 param: 0.0210 coord: 0.0139 hand_bbox: 0.0010 09-14 23:12:41 Epoch27 Loss: total: 0.0583 contact: 0.0164 vert: 0.0060 edge: 0.0002 param: 0.0207 coord: 0.0139 hand_bbox: 0.0010 09-14 23:32:17 Epoch28 Loss: total: 0.0577 contact: 0.0163 vert: 0.0060 edge: 0.0002 param: 0.0204 coord: 0.0139 hand_bbox: 0.0010 09-14 23:51:48 Epoch29 Loss: total: 0.0572 contact: 0.0161 vert: 0.0059 edge: 0.0002 param: 0.0202 coord: 0.0139 hand_bbox: 0.0010 09-15 00:11:05 Epoch30 Loss: total: 0.0567 contact: 0.0160 vert: 0.0058 edge: 0.0002 param: 0.0200 coord: 0.0139 hand_bbox: 0.0010 09-15 00:30:22 Epoch31 Loss: total: 0.0563 contact: 0.0158 vert: 0.0058 edge: 0.0002 param: 0.0198 coord: 0.0138 hand_bbox: 0.0009 09-15 00:39:16 Finished Evaluation! -------- Evaluation Results (Contact estimation) ------ Precision: 0.762 / Recall: 0.541 ---------- Evaluation Results (Reconstruction) -------- Chamfer Distance Human: 5.45 / Object: 9.04 Contact from reconstruction Precision: 0.601 / Recall: 0.449 09-15 00:58:58 Epoch32 Loss: total: 0.0559 contact: 0.0157 vert: 0.0057 edge: 0.0002 param: 0.0196 coord: 0.0138 hand_bbox: 0.0009 09-15 01:18:18 Epoch33 Loss: total: 0.0555 contact: 0.0156 vert: 0.0056 edge: 0.0002 param: 0.0194 coord: 0.0138 hand_bbox: 0.0009 09-15 01:37:31 Epoch34 Loss: total: 0.0551 contact: 0.0155 vert: 0.0056 edge: 0.0002 param: 0.0192 coord: 0.0138 hand_bbox: 0.0009 09-15 01:57:04 Epoch35 Loss: total: 0.0547 contact: 0.0153 vert: 0.0055 edge: 0.0002 param: 0.0190 coord: 0.0138 hand_bbox: 0.0009 09-15 02:16:31 Epoch36 Loss: total: 0.0544 contact: 0.0152 vert: 0.0055 edge: 0.0002 param: 0.0188 coord: 0.0138 hand_bbox: 0.0009 09-15 02:35:56 Epoch37 Loss: total: 0.0540 contact: 0.0151 vert: 0.0054 edge: 0.0002 param: 0.0186 coord: 0.0138 hand_bbox: 0.0009 09-15 02:55:28 Epoch38 Loss: total: 0.0536 contact: 0.0150 vert: 0.0054 edge: 0.0002 param: 0.0184 coord: 0.0137 hand_bbox: 0.0009 09-15 03:14:50 Epoch39 Loss: total: 0.0532 contact: 0.0149 vert: 0.0053 edge: 0.0002 param: 0.0182 coord: 0.0137 hand_bbox: 0.0009 09-15 03:34:08 Epoch40 Loss: total: 0.0528 contact: 0.0148 vert: 0.0053 edge: 0.0002 param: 0.0181 coord: 0.0137 hand_bbox: 0.0009 09-15 03:53:19 Epoch41 Loss: total: 0.0525 contact: 0.0146 vert: 0.0052 edge: 0.0002 param: 0.0179 coord: 0.0137 hand_bbox: 0.0009 09-15 04:02:11 Finished Evaluation! -------- Evaluation Results (Contact estimation) ------ Precision: 0.775 / Recall: 0.508 ---------- Evaluation Results (Reconstruction) -------- Chamfer Distance Human: 5.39 / Object: 9.37 Contact from reconstruction Precision: 0.590 / Recall: 0.485 09-15 04:24:50 Epoch42 Loss: total: 0.0522 contact: 0.0145 vert: 0.0052 edge: 0.0002 param: 0.0177 coord: 0.0137 hand_bbox: 0.0009 09-15 04:49:03 Epoch43 Loss: total: 0.0519 contact: 0.0144 vert: 0.0051 edge: 0.0002 param: 0.0176 coord: 0.0137 hand_bbox: 0.0009 09-15 05:12:22 Epoch44 Loss: total: 0.0516 contact: 0.0143 vert: 0.0051 edge: 0.0002 param: 0.0175 coord: 0.0137 hand_bbox: 0.0009 09-15 05:32:39 Epoch45 Loss: total: 0.0512 contact: 0.0142 vert: 0.0050 edge: 0.0002 param: 0.0173 coord: 0.0136 hand_bbox: 0.0009 09-15 05:54:11 Epoch46 Loss: total: 0.0509 contact: 0.0141 vert: 0.0050 edge: 0.0002 param: 0.0172 coord: 0.0136 hand_bbox: 0.0009 09-15 06:14:48 Epoch47 Loss: total: 0.0506 contact: 0.0140 vert: 0.0050 edge: 0.0002 param: 0.0170 coord: 0.0136 hand_bbox: 0.0009 09-15 06:34:58 Epoch48 Loss: total: 0.0504 contact: 0.0139 vert: 0.0049 edge: 0.0002 param: 0.0169 coord: 0.0136 hand_bbox: 0.0008 09-15 06:54:56 Epoch49 Loss: total: 0.0500 contact: 0.0138 vert: 0.0049 edge: 0.0002 param: 0.0168 coord: 0.0136 hand_bbox: 0.0008 09-15 07:19:53 Epoch50 Loss: total: 0.0497 contact: 0.0137 vert: 0.0048 edge: 0.0002 param: 0.0166 coord: 0.0136 hand_bbox: 0.0008 09-15 07:42:53 Epoch51 Loss: total: 0.0495 contact: 0.0136 vert: 0.0048 edge: 0.0002 param: 0.0165 coord: 0.0136 hand_bbox: 0.0008 09-15 07:51:36 Finished Evaluation! -------- Evaluation Results (Contact estimation) ------ Precision: 0.768 / Recall: 0.512 ---------- Evaluation Results (Reconstruction) -------- Chamfer Distance Human: 5.26 / Object: 8.76 Contact from reconstruction Precision: 0.599 / Recall: 0.477

I use one A100 80G cuda11.4 python3.19.9 pytorch1.10.1+cu113 I don't know why there is such a big deviation. Maybe it is caused by different GPUs?

dqj5182 commented 1 day ago

I am not an expert in the fundamentals of model training.

But after some research, I could find relevant discussion which you would find some help. In summary, large batch size training may potentially lead to significant degradation in performance. You can also refer to the paper mentioned in the discussion called "On Large-Batch Training for Deep Learning: Generalization Gap and Sharp Minima".

Personally, I think this issue is especially relevant to our training setting as the dataset (e.g., BEHAVE) is quite a small dataset and training a model in such a small dataset may lead to huge variations in training results for each experiment.

Please feel free to ask further questions if you have any more concerns. I am always open to discussions.