Training data structure

HsiaoYingLu commented 3 years ago

Hi @HsinYingLee

Could you share that how the audio and video data are encoded into the training set provided? I went through the data.py, but this file is just for reading the data which has already been encoded. If I would like to make my own custom dataset, what format should I follow? Also, what is the relation among indexes of .npy filenames in these folders: data////aud, data////unit, data////unitseq3, data////unitseq4?

Thank you!

zhuhaozh commented 3 years ago

@HsiaoYingLu Hi, Did you know how to prepare the dataset now?

HsiaoYingLu commented 3 years ago

@zhuhaozh Unfortunately no. Without raw data and data format clarification, I cannot decode how the author prepare the dataset. Also, whether aud.npy and unit.npy with the same filename are correspondent still remains unknown.

HsiaoYingLu commented 3 years ago

Hi @HsinYingLee

Thanks for the demo code update.

I still have some questions:

I found that in composition training, only the first 30 value in *_fps.npy file would be used. Then, what are the purposes of other .npy files in aud folder?
I see that there are different numbers of .npy files in folders aud, unit, unitseq3, unitseq4. What are the correspondent relations between these .npy files in the folders above?
In composition training phase, it is always trained on pairs consist of 3 dance units and 30-dim audio features. Hence, in testing phase, 1 music clip is correspondent to 3 generated dance units, which makes the length of output video 3 times longer than input audio. How should I fix this problem?
In order the fix the length problem, I have tried to use the modulate function, but the generated video turned out to be like a fast-forward one for the first few second. Any thought on this?

Any thought will be helpful, thanks!

WeiyuDu commented 3 years ago

Hi @HsinYingLee,

I was wondering the same thing. Could you be so kind to elaborate? Thank you so much!

NVlabs / Dancing2Music

Training data structure #11