Closed vothebao closed 5 months ago
Hi! Thank you for your interest. The data acquisition directly follows from this repository. The technical report for that codebase provides details on number of cameras, positioning of cameras, etc. For the body pose, we used a similar setup to collect the dataset.
For the raw audio, we fetched this from the dome. Unfortunately I'm not sure I know much more about the details about the actual audio capture. From that .mp4 style files, I just dumped them directly into .npy files. So no actual audio processing happened there.
Lastly, the data processing step requires the full multiview to do the reconstruction. To get a 3D avatar, we render from many sides and do reconstruction across different viewpoints.
Hope this helps!
Thank you very much for your response! I still have something that needs to be clarified:
Regarding 1. yep, both the extraction for face and body needs the full multiview recorded video, which unfortunately we will not be releasing at the moment, and 2. sadly that is still up in the air. At the moment, we are just releasing the photoreal renderings, and 3d pose estimates provided in this repo. If anything changes, I will update the repo to reflect this.
Hi! Thank you very much for the amazing work! Could you explain in more detail about the data acquisition process? For example, the number of cameras required for the capture domes, how cameras were placed, etc. The other question is how to process the raw audio to get .npy files like your dataset. And, does the data processing step just require the frontal view of the video or does it require the multiview?
Thank you!