electronicarts / character-motion-vaes

Character Controllers using Motion VAEs
BSD 3-Clause "New" or "Revised" License
254 stars 39 forks source link

How to generate mocap.npz? #1

Open Minotaur-CN opened 3 years ago

Minotaur-CN commented 3 years ago

Hi, How to generage mocap.npz, which seems not easy to me. Can u give a clue how to generate mocap.npz from public mocap dataset?

The train_mvae.py script assumes the mocap data to be at environments/mocap.npz. The original training data is not included in this repo; but can be easily extracted from other public datasets.

Thanks very much!

BEST

wangshub commented 3 years ago

+1, could you provide some details and format description about the mocap data? Thanks!

@fabiozinno @belinghy

OOF-dura commented 3 years ago

+1, could you provide some details and format description about the mocap data? Thanks!

wangshub commented 3 years ago

@Minotaur-CN @OOF-dura

the raw data format is not complex, just read the code below

belinghy commented 3 years ago

Below is some information about the data format. We also note the length of each mocap sequence (as mentioned above at L141). This is so we don't sample invalid transitions for training. If the mocap clip is one long continuous sequence, then there no reason to do this.

    0-3 : root delta x, delta y, delta facing
   3-69 : joint coordinates (22 * 3 = 66)
 69-135 : joint velocities in Cartesian coordinate in previous root frame (22 * 3 = 66)
135-267 : 6D joint orientations, i.e. first two columns of rotation matrix (22 * 6 = 132)

For extracting training data from mocap datasets, I think fairmotion might be helpful. Based on the examples I have seen, though I haven't tested, should be something like below. Root deltas need some more processing; essentially find the displacement vector and rotate by the current facing direction of the character. Same thing for positions and velocities, they should be projected to the character space to make learning easier.

from fairmotion.data import bvh

motion = bvh.load(BVH_FILENAME)

positions = motion.positions(local=False)  # (frames, joints, 3)
velocities = positions[1:] - positions[:-1]
orientations = motion.rotations(local=False)[..., :, :2].reshape(-1, 22, 6)
Realdr4g0n commented 3 years ago

@belinghy can you tell me more about how to get root deltas?

I think a sample, formula or code would be better

ameliacode commented 2 years ago

I may have misunderstood the whole process but since there aren't any sample of mocap.npz, I assume that mocap.npz should be like this:

mocap.npz "data": list of 267 float numbers (the first info about root delta is included in paper "pose representation") * frame "end_indices": 267? ( length of each mocap sequence )

It seems mocap data has to include only 22 joints, so, extracting from other public datasets may not work as bvh files or other mocap data out there may have different number of joints. Even if discarding irrelevant joints from the data, joint index order is another issue as you can see in mocap_env.py :(

Therefore.. I think there are two ways to solve:

I wasn't able to find what mocap database this project had used. and it wasn't in the paper..:(

belinghy commented 2 years ago

Your understanding of the format is correct, except end_indices marks the end of mocap clips. It depends on the number of mocap clips you have, so not necessarily 267. For example, if there are two clips of lengths 10 and 15, then end_indices = np.cumsum([10, 15]) - 1 = [9, 24].

As you've noted, mocap_env.py could definitely refactored. I think the only things to change if you are using different input format are these lines and these lines. The second reference is only if 0-3 : root delta x, delta y, delta facing. Am I missing anything else?

ameliacode commented 2 years ago

So.. as it mentioned above, if I get this right, end_indices might contain one integer value if an input clip is a long continuous sequence, However, I still don't get what "length" is in this case. Is it a frame number? or is there other unit used?

belinghy commented 2 years ago

Yes, it's a frame number. end_indices contains one integer value if there is exactly one input clip is a long continuous sequence.

edentliang commented 2 years ago

Hi, I have some confusion about 135-267 : 6D joint orientations, i.e. first two columns of rotation matrix (22 * 6 = 132) orientations = motion.rotations(local=False)[..., :, :2].reshape(-1, 22, 6) in your case, is 'z-axis' the world up vector, and 6D joint orientations are the orientations of other two directions?

Furthermore, can you provide some examples for 0-3 : root delta x, delta y, delta facing? I am a bit confused about the definition of these variables.

Thank you

ameliacode commented 2 years ago

Maybe this will help: https://arxiv.org/pdf/2103.14274.pdf : see pose representation for root information.

I think the paper and code is slightly different in terms of what up-vector they have used. Overall, root delta position has to include two values of root position projected on the ground, root facing direction also, and joint orientations have to include a form of rotation matrix of relative forward and upward vector.

Gabriel-Bercaru commented 2 years ago

Hello,

@belinghy , when reshaping the rotation components, as returned by fairmotion: orientations = motion.rotations(local=False)[..., :, :2].reshape(-1, 22, 6), how would the vector components be distributed? Considering the first joint in the first frame and the 6 on the third dimension contains 3 components for the first 2 columns of the rotation matrix, would they be laid out like in the first version below or as in the second one?

version 1:

orientations[0, 0, 0] = comp1_x
orientations[0, 0, 1] = comp1_y
orientations[0, 0, 2] = comp1_z
orientations[0, 0, 3] = comp2_x
orientations[0, 0, 4] = comp2_y
orientations[0, 0, 5] = comp2_z

version 2:

orientations[0, 0, 0] = comp1_x
orientations[0, 0, 1] = comp2_x
orientations[0, 0, 2] = comp1_y
orientations[0, 0, 3] = comp2_y
orientations[0, 0, 4] = comp1_z
orientations[0, 0, 5] = comp2_z
belinghy commented 2 years ago

Hi @Gabriel-Bercaru, I'm not sure what is fairmotion's convention. Are you rendering the character using joint orientations? If not, for the purpose of neural network input, the order shouldn't matter.

Gabriel-Bercaru commented 2 years ago

Hello, indeed for the input training data, it doesn't really matter, but I was trying to render a mesh over a trained model.

As far as I have seen, rigging makes use of the joint orientations and in order to get them I should convert those 6D orientation vectors to either Euler rotations or quaternions

belinghy commented 2 years ago

The way it's indexed, e.g., [..., :, :2], should correspond to version 2.