Hi, thanks for open sourcing your code for the paper. Leveraging temporal convolutions is a great idea and the results are promising. However, I just want to point out an issue about the back-projection idea. The idea was visited for 3D human pose estimation by 2 previous works (maybe others that I'm not aware of):
Adversarial Inverse Graphics Networks: Learning 2D-to-3D Lifting and Image-to-Image Translation from Unpaired Supervision (link) at CVPR 2017.
Can 3D Pose be Learned from 2D Projections Alone? (link) by @dylandrover et al. at ECCVW 2018.
It'd be nice to mention those great works in your preprint.
Thanks for your interest and the references! We will discuss the differences to these papers in our next revision. Glancing over the papers, it looks like they use GANs and we do not.
Hi, thanks for open sourcing your code for the paper. Leveraging temporal convolutions is a great idea and the results are promising. However, I just want to point out an issue about the back-projection idea. The idea was visited for 3D human pose estimation by 2 previous works (maybe others that I'm not aware of):
It'd be nice to mention those great works in your preprint.
Thanks!