Synchronizing Visual Data with RIR

facebookresearch / real-acoustic-fields

Real Acoustic Fields An Audio-Visual Room Acoustics Dataset and Benchmark

Other

37 stars 0 forks source link

Synchronizing Visual Data with RIR #5

Closed anton-jeran closed 5 months ago

anton-jeran commented 5 months ago

Hi,

Thank you for releasing the real-world RIR dataset :)

I am trying to map the RGB image and the corresponding RIR. I may be missing some details.

RGB image for empty room in from Eyeful Tower has following naming convention

The Real Acoustic field RIR dataset folder is organized as follows.

Please let me know for example how to map the corresponding RGB image for RIR folder "001011".

Thanks,

Anton

IFICL commented 5 months ago

Hi, we would like to clarify that the visual data and RIR are not recorded simultaneously or in the same position. As we demonstrate clearly in the paper, we separate the audio and visual collection process, where we record the RIR data without the camera rig and record multi-view images without audio recording devices. After collecting both data, we aligned them to have them in the same coordinate so that the locations could be shared.
To obtain the corresponding visual image for RIR at a specific location, you will need to train a NeRF or 3DGS to fit the scene to render the images from the novel viewpoint. You can use NeRFstudio to achieve it.

anton-jeran commented 5 months ago

Thank you for the quick reply and clarification.

It would be great help if you could share the paired images you have already generated for your Experiments e.g., AV-NERF training and evaluation.

The abstract mentions that "The dataset includes high-quality and densely captured room impulse response data paired with multi-view images"

IFICL commented 5 months ago

We are unable to share those generated images, we recommend you to generate those images by yourselves under your setup. Sorry for the confusion, "paired" here doesn't mean one-to-one correspondence between RIR and images.

anton-jeran commented 5 months ago

Sorry for troubling again.

Does the Visual Data include camera positions. I cannot see from "Visual Data Organization" link https://github.com/facebookresearch/EyefulTower?tab=readme-ov-file#data-organization. In that case, we can't train NERF?.

Also do you think the coordinate system in visual and audio data is synchronous (i.e., origin (0,0,0) in audio and visual data refers to the same location in the environment? Since 2 different teams collected the data, there coordinate system may not be synchronous?

IFICL commented 5 months ago

Can you check cameras.json? I assume the camera position and pose information should be in there.

I highly recommend you go over our paper to have a better picture of our dataset. We use optic tracking systems and synchronize the coordinates of audio and visual collection so that they are aligned.

anton-jeran commented 5 months ago

Thanks for all the information.

isrish commented 5 months ago

If you're training a NERF model, I highly recommend using nerfstudio. The visual data is organized for easy integration. You can follow the steps described in https://docs.nerf.studio/quickstart/existing_dataset.html.

For example:

# Download a few room-scale scenes from the EyefulTower dataset at different resolutions:
ns-download-data eyefultower --capture-name emptyroom --resolution-name jpeg_1k jpeg_2k

anton-jeran commented 5 months ago

Thanks :)