facebookresearch / audio2photoreal

Code and dataset for photorealistic Codec Avatars driven from audio
Other
2.66k stars 250 forks source link

Data acquisition and processing #58

Closed vothebao closed 5 months ago

vothebao commented 6 months ago

Hi! Thank you very much for the amazing work! Could you explain in more detail about the data acquisition process? For example, the number of cameras required for the capture domes, how cameras were placed, etc. The other question is how to process the raw audio to get .npy files like your dataset. And, does the data processing step just require the frontal view of the video or does it require the multiview?

Thank you!

evonneng commented 5 months ago

Hi! Thank you for your interest. The data acquisition directly follows from this repository. The technical report for that codebase provides details on number of cameras, positioning of cameras, etc. For the body pose, we used a similar setup to collect the dataset.

For the raw audio, we fetched this from the dome. Unfortunately I'm not sure I know much more about the details about the actual audio capture. From that .mp4 style files, I just dumped them directly into .npy files. So no actual audio processing happened there.

Lastly, the data processing step requires the full multiview to do the reconstruction. To get a 3D avatar, we render from many sides and do reconstruction across different viewpoints.

Hope this helps!

vothebao commented 5 months ago

Thank you very much for your response! I still have something that needs to be clarified:

  1. The body poses and face expression extraction step needs the full multiview recorded video, right?
  2. When do you plan to release the video from the dataset and processing code?
evonneng commented 5 months ago

Regarding 1. yep, both the extraction for face and body needs the full multiview recorded video, which unfortunately we will not be releasing at the moment, and 2. sadly that is still up in the air. At the moment, we are just releasing the photoreal renderings, and 3d pose estimates provided in this repo. If anything changes, I will update the repo to reflect this.