DegardinBruno / Kinetic-GAN

Code for the paper "Generative Adversarial Graph Convolutional Networks for Human Action Synthesis", WACV 2022
67 stars 16 forks source link

Data Preprocessing Human3.6M + Adaptation for different skeleton #3

Open StevRamos opened 2 years ago

StevRamos commented 2 years ago

How did you preprocess the Human3.6M dataset? I would like to replicate npy and pkl files that you provide. Do you have a code of these? Thanks in advance!

DegardinBruno commented 2 years ago

Hi @StevRamos, thanks for your interest in Kinetic-GAN! For consistency purposes, we obtained the same data as previous methods. Authors of SA-GCN (“Structure-Aware Human-ActionGeneration”) provided us their data obtained by the other methods also. Their GitHub: https://github.com/PingYu-iris/SA-GCN

We just rearranged it to be easier to use! Let me know if you have any further question.

StevRamos commented 2 years ago

Thanks for the prompt response! I will review it.

I would like to use your model to generate new videos of sign language (for data augmentation purposes). The problem is the dataset I have is a set of videos. I recently learned a little bit of GNN so as I understand each node has features. It would be really good if you can tell me if it is possible to get (replicate) these features in the nodes of each video in my dataset (sign language videos) or if I need to have other tools to make it possible, and what these features represent.

You did an amazing work! Thanks for making the code public!

DegardinBruno commented 2 years ago

Thank you very much! Btw, the content/shape of each dataset is N x C x T x V (x M), where N is the number of samples, C the number of coordinates, T temporal instances (frames) and V the number of joints. M is usually 1 if there is a fifth dimension.

Yes absolutely, great idea, you can even make your own conditional model with Kinetic-GAN to generate specific words and letters, you just need to extract it's 2D or 3D hand pose estimation first! After that, you will need to define/change it's adjacency matrix (V x V matrix, where V is the number of joints in the hand, where connected joints have 1 otherwise 0) by changing the connected joints in the data (check graph_ntu.py file). Then, you define/change the upsampling and downsampling path (check also graph_ntu.py file). There are some comments there where you can visualize the upsampling paths just by testing that code!

StevRamos commented 2 years ago

Thanks you very much @DegardinBruno . That helps me a lot! So the information I need are the coordinates for each joint (in each timestep). I will get into the code. I think it is promissing!

Just to clarify, I have some questions.

  1. Should all the frames in the video have the same number of joints?
  2. What do you mean by local and global movement?
  3. What the dimension resolution level L (paper) means? (I think you refer in this issue as M)

Again, thanks in advance!

DegardinBruno commented 2 years ago
  1. Should all the frames in the video have the same number of joints?

Yes, at this point, Kinetic-GAN only supports a fixed number of joints through all frames.

  1. What do you mean by local and global movement?

Check our video at 0:27s. In local movement, the skeleton is normalized to a root joint, and on the other hand, global movement describes the skeleton moving freely without constraints.

  1. What the dimension resolution level L (paper) means? (I think you refer in this issue as M)

As you can see in figure 4 (paper), we define our upsampling path with four levels where level 1 is a single point from the latent space and level 4 is the complete skeleton from the respective dataset.

M represents something different! In NTU RGB+D sometimes they have 2 skeletons in each data sample, that's where M come from. However, Kinetic-GAN still does not support action interaction between two samples.

StevRamos commented 2 years ago

Hi @DegardinBruno, I was using your model as I told you months ago. It worked! but now I would like to use it with other graph structure. When I tried this time, I got an error. Basically, it is because of the assertion (assert len(self.center) == self.lvls). That's why I want to know what is the notion behind the algorithm shown in https://github.com/DegardinBruno/Kinetic-GAN/blob/b5d8d4d926b23236ab74e1d8ab348c72841c2482/models/init_gan/graph_ntu.py#L56-L89. If you could explain me the idea with pseudo-code, I would appreciate it so much. Thanks in advance!

Stev

DegardinBruno commented 2 years ago

Hey @StevRamos, great!!

It would be best if you changed the neighbor_base with the connections of your skeleton structure. Uncomment the lines before the assertions to visualise your graph levels!

If you could explain me the idea with pseudo-code, I would appreciate it so much.

We are basically removing edges, letting at least one parent in the graph for the next level because you can't just remove edges since it will become inconsistent.

hendrikTpl commented 1 year ago
  1. Should all the frames in the video have the same number of joints?

Yes, at this point, Kinetic-GAN only supports a fixed number of joints through all frames.

  1. What do you mean by local and global movement?

Check our video at 0:27s. In local movement, the skeleton is normalized to a root joint, and on the other hand, global movement describes the skeleton moving freely without constraints.

  1. What the dimension resolution level L (paper) means? (I think you refer in this issue as M)

As you can see in figure 4 (paper), we define our upsampling path with four levels where level 1 is a single point from the latent space and level 4 is the complete skeleton from the respective dataset.

M represents something different! In NTU RGB+D sometimes they have 2 skeletons in each data sample, that's where M come from. However, Kinetic-GAN still does not support action interaction between two samples.

Hi @DegardinBruno, thanks for providing this code, btw I am working on to Human interaction generation, as you said it is not supported yet for interaction, would you please guide me and provide some note to make this possible? Recently I was working HIR (recognition only) now, I want to use your code and model to generate skeleton data (data augmentation) for small dataset. It would be great and much appreciate your help. Thanks