Closed abcdvzz closed 2 months ago
We extract equal-length motion and audio clips of BEAT (34 frames) and SHOW (88 frames) via sliding window, where the step size is the half of window size, and then save them into lmdb format. You can refer to original BEAT GitHub repo for the preprocessing code.
Thank you so much for your quick reply. However, I am still confused about how to process talkshow dataset. Is this the same with Beat? The repo you attached is empty. Could you please tell me where the script is?
Seems that they remove the code in this repo. Here is their anonymous code for CaMN https://github.com/beat2022dataset/beat, and you can find the lmdb caching code in beat.py. I cached the motion of SHOW dataset in the same way as BEAT.
Thank you so much for your quick reply. For beat dataset, there are mutiple data formats, like wav, bvh, json, txt. I assume these files are all used in https://github.com/beat2022dataset/beat/blob/main/dataloaders/beat.py. However, for show dataset, there are only wav and pkls, how to solve this? Thank you.
You can utilize TalkSHOW's dataloader to load the data, and then cache them into lmdb.
So basically, for the show dataset, we only need the some of the dicts from the pkl files and extracted hubert and melspec feature from the wavs?
Hi can you please tell me what the speaker_id is? I noticed that in your code, it needs '-20' which makes me confusing.
So, we don't need to normalize the audio for the Show dataset right?
Hi @abcdvzz,
So basically, for the show dataset, we only need the some of the dicts from the pkl files and extracted hubert and melspec feature from the wavs?
Yes, you can extract the data you want from the loaded data of SHOW using their dataloader, and then cache them into lmdb format.
Hi can you please tell me what the speaker_id is? I noticed that in your code, it needs '-20' which makes me confusing.
Since the original id from the SHOW dataset starts from 20, I minus 20 to let the id start from 0.
So, we don't need to normalize the audio for the Show dataset right?
Yes. I did not normalize the audio.
Can anyone please tell me how to preprocess the dataset?