facebookresearch / multiface

Hosts the Multiface dataset, which is a multi-view dataset of multiple identities performing a sequence of facial expressions.
Other
718 stars 50 forks source link

How to use audio files aligned with tracked mesh? #31

Closed chen8750 closed 1 year ago

chen8750 commented 1 year ago

Thanks for your excellent work! What does the name of audio file mean? And how to align with the tracked mesh files?

alexanderrichard commented 1 year ago

Hi, the audio files are named after the sentences the participants were saying. They all start with SEN (for "sentence") and then have the words spoken in that sentence in the filename (for example "A good morrow to you, my boy").

They are cut so that they fit the sequence length of the tracked meshes. So, if you concatenate all tracked meshes of a sentence (which are at 30fps) in numerical order, the audio will have the same duration as that sequence. In other words, every 1600 samples from the audio file (48kHz) correspond to a frame.