brjathu / PHALP

Code repository for the paper "Tracking People by Predicting 3D Appearance, Location & Pose". (CVPR 2022 Oral)
Other
259 stars 38 forks source link

Question about relational transformer model #9

Closed xiexh20 closed 1 year ago

xiexh20 commented 1 year ago

Dear authors,

Thanks for the great work and releasing the code! I have a few questions regarding the transformer design choices:

  1. Why is there a division by 10 in this code? It looks like a normalization factor, but I did not find it in the original relational model implementation. Can you explain how you set this value?
  2. In you previous work T3DP, you use a vanilla transformer that compute attention directly, while here you use a more advanced relational model. Is there a specific consideration for this design change? Do you have any experiments that show current design is better? Thank you very much for your time!

Best, Xianghui

brjathu commented 1 year ago

Hi Xianghui,

Thanks for you interest in our work. For the first question, we set this value empirically, to change the pose by only a small amount. For the second question, this was a change due to legacy code and we indeed didn’t observe a performance boost from that change.

Thanks, Jathushan

xiexh20 commented 1 year ago

thanks a lot for your help!