Open LeKiet258 opened 1 year ago
Hi @LeKiet258 ! Thank you for your focus. Ground truth is not needed for inference, however, you might need to process your own data following instructions here. You can directly leave gt value as 0 if you don't have them. If you want to run inference with visualization, you can also refer to mmhuman3d, which support smoothnet.
Thanks for your fast reply! MMHuman3d currently doesn't support my model of choice yet, which is HybrIK, so I have to respectively clone HybrIK to produce the output and then run SmoothNet on this output, hence my question. As for the gt value set to 0, did you mean by setting the pose
and shape
parameters ? (in my case, I use SMPL body representation)
In this repo, we have not provide inference pipeline. So you can use the evaluation pipeline for testing. Due to the evaluation process need ground truth to calculate metrics, we recommend to add groundtruth in data processing. However, for the reason that you only need the inference part, you can set the ground truth value to any random value (e.g., 0). But you may still need to modify the code to obtain the smoothed output pose from the whole pipeline.
@LeKiet258 Could you share your repo? I am also trying SmoothNet with HybrIK without success.
@juxuan27 a inference pipeline is very useful for people to try your work. Please consider adding one
Hi @nick008a ,
Thanks for the kind suggestion! We will add it in March.
@ailingzengzzz thank you for the reply!! Would be even better if it can be used something like HybrIK
@nick008a I haven't successfully combined HybrIK with SmoothNet yet, so I changed HybrIK with another model in order to run. Sorry :(
@LeKiet258 @nick008a @ailingzengzzz I have a simillar question, from the code provided on 2d, the dataset preparing for evaluation actually used a middle postiion data as normalization reference: https://github.com/cure-lab/SmoothNet/blob/be3f9ce43399aa7c6c3c5400f2d5fdcfd958cfc4/lib/dataset/h36m_dataset.py#L164
how to do that in realtime situation say I have 16 length only? And noticed it normalized with a Magic number H36M_IMageshape which is 1000, but when keypoints detected on arbitary videos, the image shapes can be various, does it matter or not?
Hi @lucasjinreal ,
I'm sorry for the late reply. I check this version and find I did not present the general normalization. Here you can use the following functions to normalize with arbitrary videos and input the normalized keypoint positions into SmoothNet, and then denormalize the smoothed keypoints into the image coordinate.
def normalize_screen_coordinates(X, w, h):
# Normalize so that [0, w] is mapped to [-1, 1], while preserving the aspect ratio
if X.shape[-1] == 3: #input 3d pose
X_norm = X[..., :2]
X_norm = X_norm / w * 2 - [1, h / w]
X_out = np.concatenate((X_norm, X[..., 2:3] / 1000), -1)
else:
assert X.shape[-1] == 2
X_out = X / w * 2 - [1, h / w]
return X_out
def image_coordinates(X, w, h):
# Reverse camera frame normalization
if X.shape[-1] == 3: #input 3d pose
X_norm = X[..., :2]
X_norm[..., :1] = (X_norm[..., :1] + 1) * w / 2
X_norm[..., 1:2] = (X_norm[..., 1:2] + h / w) * w / 2
X_out = torch.cat([X_norm, X[..., 2:3] * 1000], -1)
else:
assert X.shape[-1] == 2
X_out = (X + [1, h / w]) * w / 2
return X_out
May I ask what method you used?
Hi, I want to run SmoothNet on an arbitary video, but it seems like a ground truth for that video is required even for the inference phase, right?