Open YoungSeng opened 1 year ago
Hi, thanks for the great work!
Regarding your approach:
I am wondering if this is a bug in sample.py when smoothing the transitions here:
As you have commented yourself, the size of varaible last_poses is (1, model.njoints, 1, args.n_seed), so len(last_poses) is always 1. I think len(last_poses) should be replaced with np.size(last_poses, axis=-1) which is args.n_seed (30 frames by default). This way, it combines the first frames of the new prediction with the last frames of previous prediction, something like this:
for j in range(np.size(last_poses, axis=-1)): n = np.size(last_poses, axis=-1) prev = last_poses[..., j] next = sample[..., j] sample[..., j] = prev (n - j) / (n + 1) + next (j + 1) / (n + 1)
Am I right? Would appreciate your feedback. Thanks a lot
Yes, when I reproduced it later I remembered that there was a minor problem in this region, but it didn't seem to have much effect on the results. Also:
last_poses
is not 1, but n_seed
, where the first 1 indicates the batch size and the second 1 extends the dimensions, which has no real meaning.
The segments we trained are all 4s long, and it is difficult to generalize to arbitrary length gestures by positional encoding alone. MDM-based models that require time-awareness (arbitrarily long inference) require a smooth transition between the generated sequences. The following practices can be referred to: