benjiebob / SMALify

This repository contains an implementation for performing 3D animal (quadruped) reconstruction from a monocular image or video. The system adapts the pose (limb positions) and shape (animal type/height/weight) parameters for the SMAL deformable quadruped model, as well as camera parameters until the projected SMAL model aligns with 2D keypoints and silhouette segmentations extracted from the input frame(s).
106 stars 18 forks source link

non-issue question: Is this a correct result? #23

Closed nightfarrow closed 3 years ago

nightfarrow commented 3 years ago

Hi, I ran the python smal_fitter/optimize_to_joints.py on the Stanford image. The result I got was: (st10_ep0.ply previewed in online ply viewer) I'm just wondering, is this the correct result? The legs are kind of stuck together, which is visually a bit strange. It's incredible software for what it is, but I just want to verify that this is how the result is supposed to look? Thanks!!

benjiebob commented 3 years ago

Hey, I see you opened this issue and then closed. Did you resolve this somehow?

nightfarrow commented 3 years ago

Reopened now, as it is not entirely resolved. I do see that there is a section on the project page titled "Improving performance and general tips and tricks" that looks like the solution to making the dog's pose more natural. However,

# OPTIMIZER - You may need to adjust these depending on the sequence.
OPT_WEIGHTS = [
    [25.0, 10.0, 7.5, 5.0], # Joint
    [0.0, 500.0, 5000.0, 5000.0], # Sil Reproj
    [0.0, 1.0, 1.0, 1.0], # Betas
    [0.0, 1.0, 1.0, 1.0], # Pose
    [0.0, 100.0, 100.0, 100.0], # Limits TODO!
    [0.0, 0.1, 0.1, 0.1], # Splay
    [500.0, 100.0, 100.0, 100.0], # Temporal
    [150, 400, 600, 800], # Num iterations
    [5e-3, 5e-3, 5e-4, 1e-4]] # Learning Rate

^ I'm not sure what number range is appropriate for each. Meaning, for example, that I don't know if I can try putting "500" as one of the #Splay numbers or not. Is there any way to know what are minimum and maximum numbers for each of these OPT_WEIGHTS?

It's also somewhat unclear which (2D Keypoint Reprojection, 3D Shape Prior, 3D Pose Prior, 2D Silhouette, and Temporal) lines up with which (Joint, Sil Reproj, Betas, Pose, Limits, Splay, Temporal, Num interations, Learning Rate). I think 'Temporal' lines up with '#Temporal', '3D Pose Prior' lines up with '#Pose', '2D Silhouette' with '#Sil Reproj'. I am unsure which line of code aligns with '2D Keypoint Reprojection' or '3D Shape Prior'.

So, basically, I think I figured out I am supposed to adjust the OPT_WEIGHTS to make the dog posed more naturally, but I don't know how to actually properly fiddle with the OPT_WEIGHTS yet.

benjiebob commented 3 years ago

I can explain all this... however to begin with, could you please let me know which Stanford image you ran on? If you could zip the full output directory and email to me bjb56@cam.ac.uk I can take a closer look.

benjiebob commented 3 years ago

OK - scratch my last comment. I can reproduce the issue.

As a quick primer, the OPT_WEIGHTS variable contains scalar weights which are applied multiplicatively to various parts of the loss function which increase/decrease the effect they have on the solution. Think of it a bit like this:

Loss = A * KeypointAlignment + B * SilhouetteOverlap + C * TemporalSmoothness + D * PosePrior ... 

The values A, B, C here are set using OPT_WEIGHTS. What you are watching as the system runs is an optimizer trying to minimize this loss function. You can imagine by fiddling with the weights (e.g. perhaps multiply D by 100), the optimizer will change how much it "cares" about minimizing that part of the loss (e.g. PosePrior) compared to the other terms. Therefore, there is no 'range' for these values (although setting a negative or complex weight would be a bit weird), but what matters really is the balance we set between them.

Of course, the ideal solution would be to find values A,B,C,D which work all the time for all sequences. The existing values in OPT_WEIGHTS were my attempt at doing this, at least until you've found a good counterexample -- which I thank you for!

To briefly explain the OPT_WEIGHTS construction: Each row indicates a different part of the loss function, and the four columns indicate the 4 optimization stages (this essentially allows the balance of A,B,C,D to change 4 times during the run).

For this specific issue, which has to do with how 'anatomically plausible' the 3D fit is, I would recommend boosting the values in the 4th row (with the pose comment). For example, you could try running with the OPT_WEIGHTS set like this, and observe the effect:

OPT_WEIGHTS = [
    [25.0, 10.0, 7.5, 5.0], # Joint
    [0.0, 500.0, 5000.0, 5000.0], # Sil Reproj
    [0.0, 1.0, 1.0, 1.0], # Betas
    [0.0, 100.0, 100.0, 100.0], # Pose
    [0.0, 100.0, 100.0, 100.0], # Limits TODO!
    [0.0, 0.1, 0.1, 0.1], # Splay
    [500.0, 100.0, 100.0, 100.0], # Temporal
    [150, 400, 600, 800], # Num iterations
    [5e-3, 5e-3, 5e-4, 1e-4]] # Learning Rate

Of course, balancing these terms is a bit of an "art" and can be a bit time consuming. It's sometimes best to use a 'grid search' in which you run the system many, many times with different settings so you can find the best solution. If you do find a better set of weights, even if only for this image, please do let me know! If I get time around my thesis writing (d'oh) I'll try to look at it myself.

As a final comment, it's important to keep in mind that trying to determine the 3D structure of an object from only a single 2D image is a hard, inherently ambiguous problem. In fact, the leg issue you observe is only apparent when you rotate the 3D model and isn't observable from the original camera angle which the optimizer must deal with. We try to overcome this ambiguity with pose priors (think of this as some very basic knowledge of dog anatomy learnt using artist data) which aim to keep the output 3D models realistic, but it won't always be perfect.

I hope this helps a tad :)

nightfarrow commented 3 years ago

Thank you! That works perfectly. I really appreciate the explanations.

benjiebob commented 3 years ago

No problem! If you have achieved a better fit, do you mind uploading an image to this issue? I suspect it might help others understand the effect of the weighting scheme.

nightfarrow commented 3 years ago

Sure! This is the output now. Looks near identical to the source image and looks natural from all angles. Dog1 Dog2 Dog3

benjiebob commented 3 years ago

Great -- awesome stuff! Thanks for your help in debugging this.