Finetuning, problem reproducing results for TotalCapture

Xinyu-Yi / TransPose

A real-time motion capture system that estimates poses and global translations using only 6 inertial measurement units

https://xinyu-yi.github.io/TransPose/

GNU General Public License v3.0

373 stars 72 forks source link

Finetuning, problem reproducing results for TotalCapture #42

Closed PuckelTrick closed 3 months ago

PuckelTrick commented 1 year ago

Hi,

I am able to reproduce your results for the DIP-IMU Dataset after finetuning, but am far off on the TotalCapture dataset. What did you do to achieve those results. For finetuning I tried different learning rates (1e-3, 1e-4, 1e-5) and early stopping with patience 0 to 3. The data is processed with your scripts and the Total Capture data is the version from the DIP authors. To put it in numbers for the SIP Error / angular Error i even beat your numbers from the paper for DIP-IMU but always get something in the range 25 (SIP error) / 15 (ang error) for Total Capture.

So how did you finetune your model to achieve your results?

Xinyu-Yi commented 1 year ago

I remember that I just used a lower learning rate and trained the network for several epochs.

Junlin-Yin commented 1 year ago

As for finetuning, there's one point I want to confirm.

While the three stages are pre-trained separately, do you finetune them together with only the 6d rotation loss (a.k.a. formula no.3 in the paper)? Since the DIP-IMU gt doesn't include joint positions.

Jaceyxy commented 1 year ago

@Junlin-Yin Hello, may I ask if the fine tuning on dip data set means that dip is divided into two parts, one for training and the other for testing? Specifically, s_09 and s_10 are used for testing

Xinyu-Yi commented 1 year ago

As for finetuning, there's one point I want to confirm.

While the three stages are pre-trained separately, do you finetune them together with only the 6d rotation loss (a.k.a. formula no.3 in the paper)? Since the DIP-IMU gt doesn't include joint positions.

I fine-tune them separately. All the three networks use root-centered coordinate frame. No translation is needed here.

Xinyu-Yi commented 1 year ago

@Junlin-Yin Hello, may I ask if the fine tuning on dip data set means that dip is divided into two parts, one for training and the other for testing? Specifically, s_09 and s_10 are used for testing

Maybe s08 is used for validation.