Closed asimniazi63 closed 3 years ago
Hi,
Yes the system at the time this code was released was quite susceptible to changes in relative person size, because the training inputs were cropped to a bounding box around the synthetic silhouettes/joints (with a small random bbox scaling factor for augmentation).
The standard way to deal with this is to also crop any test inputs around the detected silhouette/joints before 3D prediction to mimic the training data preprocessing, then un-crop after prediction, but it seems like I forgot to implement that in the code released here 😄. I'll get around to it when I've got some time.
You could also try increasing the random bbox scaling range in data augmentation to train the network to be more robust, but the test-time solution makes more sense and will probably work better.
I have tried reconstructing same person based on various distances and results are hilarious. Would be appreciated if brief me about it and any solution about catering distances in your solution. Examples: