Results on UBC data is noisy

Yes, we do observe the noisy T-pose results, but when we look at the training poses or some poses near human poses in the input video, it will look better. We guess this may be due to the highly dynamic clothes are hard to generalize to novel poses given the very limited observations in UBC. Also, note that in-the-wild UBC seqs with dynamic clothes are extremely more challenging than People-Snapshot and ZJU-MoCap because the poses are inaccurate and quite noisy, as well as the pose distribution is singular and only a few side view frames are provided in the video. However, when you look at the baseline, it is even worse. We hope our first small step reveals some new challenges for this in-the-wild problem.

Another tip you may notice in our code is that I intentionally leave the SD guidance and real video-fitting steps in one file. Our very early results suggest that a hybrid of the SD guidance and the real fitting will largely help to address this issue. But I haven't had time to implement this in the code release.

JiahuiLei / GART

Results on UBC data is noisy #2