Xinyu-Yi / TransPose

A real-time motion capture system that estimates poses and global translations using only 6 inertial measurement units
https://xinyu-yi.github.io/TransPose/
GNU General Public License v3.0
373 stars 72 forks source link

Training your model to reproduce the results. #35

Closed PuckelTrick closed 3 months ago

PuckelTrick commented 1 year ago

Hi,

Unfortunately I am unable to reproduce your results from the paper, can you maybe guide me in were the error lies? As you did not make your training script available I rebuild your model, using tensorflow. The network layout is identical to the networks described in your paper, the models are trained with the same data (using your data synthesis scripts) and everything else is also done exactly as described in your paper (adam with lr 1e-3, noise on the data where and as you described it).

The only difference is that I only trained the pose estimation part, which shouldn't matter as you described to train each network seperately and the pose estimation part takes no input from the position estimation part. Plus I used a lower batchsize due to hardware constraints, which again shouldn't have such a large effect (different runs ranging in batch size from 1 to 32).

Even then I get even better results when training the models combined instead of training them seperately, but not close to you reported values. e.g. when fine-tuned on DIP-IMU SIP/angular errors of 18.98 / 11.87 instead of your reported 13.97 / 7.62 (of course with your provided evaluation script).

Could you tell me where I missed something or please provide your training scripts?

Xinyu-Yi commented 1 year ago

Hi, you should not directly use the data synthesized by my codes in the supervision. they are all in global frame, while the pose estimation pipeline should be trained in the root frame. please refer to the paper for details.

PuckelTrick commented 1 year ago

Ok, I did that as described in the Paper for accelerations and orientations. But in fact missed it for the target positions. Just to be clear, also the full position estimation is root relative (As I did not find it specifically mentioned for those in the paper)?

Xinyu-Yi commented 1 year ago

yes. leaf/full joint positions/rotations are all expressed in (relative to) the root frame.

PuckelTrick commented 1 year ago

Ok, the last thing unclear to me is the input to the second and third model (leaf position -> full position and full position -> Pose ). You say that you train the models seperatly, but is the leaf/full position concatenated to the input the (noised) ground truth data when training, or the (noised) output of the respective last model? I am testing both now but would be good know in order to solve possible problems.

Xinyu-Yi commented 1 year ago

We use the ground truth data (with normal noise) for the leaf/full joint positions during training. I think this may not lead to a significant difference.

Jaceyxy commented 1 year ago

你好,请问相对于跟关节的位置是否为以跟关节为原点,计算其他关节点坐标位置 相对于跟关节的旋转则是论文中提到的Calibration与Normalization

Xinyu-Yi commented 1 year ago

你好,请问相对于跟关节的位置是否为以跟关节为原点,计算其他关节点坐标位置 相对于跟关节的旋转则是论文中提到的Calibration与Normalization

Exactly the position and orientation of a joint expressed in the root coordinate frame.

TowoC commented 1 year ago

yes. leaf/full joint positions/rotations are all expressed in (relative to) the root frame.

Hello @PuckelTrick and @Xinyu-Yi

I saw that Xinyu says the position/rotation all need to transform to root frame.

And I saw the preprocess of AMASS in the preprocess.py, the output orientation (out_pose) is in local frame(not use FK).

tmp

It's unclear to me about the frame definition. Is that means all output of the AMASS preprocess are in global except of orientation? And I want to know how to transform the orientation/position from local to root. I have seen that the orientation/position in DIP-IMU dataset is also in local frame. It's would be a big help to me to know how to transform.

Thanks a lot

Jaceyxy commented 1 year ago

@Xinyu-Yi 您好,请问我该如何使用amass数据集,按照我的理解,其合成的旋转矩阵为骨骼相对于全局坐标系下的旋转,我该如何将其转换为骨骼相对于身体坐标系的旋转(无法获取全局坐标系到身体坐标系的转换关系)

Jaceyxy commented 1 year ago

@PuckelTrick @Xinyu-Yi hi,Can I know the length of the time series during the training? Thanks a lot!

JaggerZr commented 1 year ago

Hi,

Unfortunately I am unable to reproduce your results from the paper, can you maybe guide me in were the error lies? As you did not make your training script available I rebuild your model, using tensorflow. The network layout is identical to the networks described in your paper, the models are trained with the same data (using your data synthesis scripts) and everything else is also done exactly as described in your paper (adam with lr 1e-3, noise on the data where and as you described it).

The only difference is that I only trained the pose estimation part, which shouldn't matter as you described to train each network seperately and the pose estimation part takes no input from the position estimation part. Plus I used a lower batchsize due to hardware constraints, which again shouldn't have such a large effect (different runs ranging in batch size from 1 to 32).

Even then I get even better results when training the models combined instead of training them seperately, but not close to you reported values. e.g. when fine-tuned on DIP-IMU SIP/angular errors of 18.98 / 11.87 instead of your reported 13.97 / 7.62 (of course with your provided evaluation script).

Could you tell me where I missed something or please provide your training scripts?

Hi,i am reproduce this project too. I find that the sequence len of amss data are long 1900. Do u cut it small during the training process?

GUIMINLONG commented 11 months ago

@PuckelTrick @Xinyu-Yi hi,Can I know the length of the time series during the training? Thanks a lot!

Hi Jaceyxy, did you figure out thise problem? I am also confused about this

Xinyu-Yi commented 9 months ago

Sorry for the late response. The sequences are split into short sequences in length 300.

Xinyu-Yi commented 9 months ago

To change the global rotation to the root coordinate frame, left-multiply the inverse of the root rotation. To change the global position to the root coordinate frame, minus the root global position and then left-multiply the inverse of the root rotation. This should be easy as the global rotation of the root is known. We do not need a claibration when using AMASS dataset.