JeremyCJM / DiffSHEG

[CVPR'24] DiffSHEG: A Diffusion-Based Approach for Real-Time Speech-driven Holistic 3D Expression and Gesture Generation
https://jeremycjm.github.io/proj/DiffSHEG/
BSD 3-Clause "New" or "Revised" License
112 stars 9 forks source link

dataset process #11

Closed abcdvzz closed 2 months ago

abcdvzz commented 3 months ago

Can anyone please tell me how to preprocess the dataset?

JeremyCJM commented 3 months ago

We extract equal-length motion and audio clips of BEAT (34 frames) and SHOW (88 frames) via sliding window, where the step size is the half of window size, and then save them into lmdb format. You can refer to original BEAT GitHub repo for the preprocessing code.

abcdvzz commented 3 months ago

Thank you so much for your quick reply. However, I am still confused about how to process talkshow dataset. Is this the same with Beat? The repo you attached is empty. Could you please tell me where the script is?

JeremyCJM commented 3 months ago

Seems that they remove the code in this repo. Here is their anonymous code for CaMN https://github.com/beat2022dataset/beat, and you can find the lmdb caching code in beat.py. I cached the motion of SHOW dataset in the same way as BEAT.

abcdvzz commented 3 months ago

Thank you so much for your quick reply. For beat dataset, there are mutiple data formats, like wav, bvh, json, txt. I assume these files are all used in https://github.com/beat2022dataset/beat/blob/main/dataloaders/beat.py. However, for show dataset, there are only wav and pkls, how to solve this? Thank you.

JeremyCJM commented 3 months ago

You can utilize TalkSHOW's dataloader to load the data, and then cache them into lmdb.

abcdvzz commented 3 months ago

So basically, for the show dataset, we only need the some of the dicts from the pkl files and extracted hubert and melspec feature from the wavs?

abcdvzz commented 3 months ago

Hi can you please tell me what the speaker_id is? I noticed that in your code, it needs '-20' which makes me confusing.

abcdvzz commented 3 months ago

So, we don't need to normalize the audio for the Show dataset right?

JeremyCJM commented 2 months ago

Hi @abcdvzz,

So basically, for the show dataset, we only need the some of the dicts from the pkl files and extracted hubert and melspec feature from the wavs?

Yes, you can extract the data you want from the loaded data of SHOW using their dataloader, and then cache them into lmdb format.

Hi can you please tell me what the speaker_id is? I noticed that in your code, it needs '-20' which makes me confusing.

Since the original id from the SHOW dataset starts from 20, I minus 20 to let the id start from 0.

So, we don't need to normalize the audio for the Show dataset right?

Yes. I did not normalize the audio.