Sound Localization from Motion: Jointly Learning Sound Direction and Camera Rotation

Ziyang Chen, Shengyi Qian, Andrew Owens
University of Michigan, Ann Arbor
ICCV 2023

This repository contains the official codebase for Sound Localization from Motion: Jointly Learning Sound Direction and Camera Rotation. [Project Page]

Environment

To setup the environment, please simply run

conda env create -f environment.yml
conda activate SLfM

Datasets

LibriSpeech

We use speech samples from this dataset to render binaural audio. Data can be downloaded from here. Please see Dataset/LibriSpeech for more processing details.

Free Music Archive (FMA)

We use audio samples from this dataset to render binaural audio. Data can be downloaded from FMA offical github repo. Please see Dataset/Free-Music-Archive for more processing details.

HM3D-SS

we use the SoundSpaces 2.0 platform and Habitat-Matterport 3D dataset to create our audio-visual dataset HM3D-SS. Please follow the installation guide from here.

We provide the code under (Dataset/AI-Habitat) for generating the dataset. To create HM3D-SS dataset, simply run:

cd Dataset/AI-Habitat
# please check out the bash files before running, which require users to sepecify the output directory
sh ./multi-preprocess.sh
sh ./multi-postprocess.sh

Demo Videos

We also provide self-recorded real-world videos under Dataset/DemoVideos/RawVideos. The videos are recorded using iPhone 14 Pro and binaural audio are recorded with Sennheiser AMBEO Smart Headset. The demo videos are for research purposes only.

Pretrained Models

We will release several models pre-trained with our proposed methods. We hope it could benefit our research communities. To download all the checkpoints, simply run

cd slfm
sh ./scripts/download_models.sh

Train & Evaluation

We provide training and evaluation scripts under scripts, please check each bash file before running.

To train and evaluate our SLfM cross-view binauralization pretext task and perform linear probing experiments, simply run:
```
cd slfm
sh ./scripts/training/slfm-pretext.sh
```
To train and evaluate our SLfM model with freezed embedding from the pretext task, simply run:
```
cd slfm
sh ./scripts/training/slfm-geometric.sh
```

Citation

If you find this code useful, please consider citing:

@inproceedings{
    chen2023sound,
    title={Sound Localization from Motion: Jointly Learning Sound Direction and Camera Rotation},
    author={Chen, Ziyang and Qian, Shengyi and Owens, Andrew},
    booktitle = {ICCV},
    year={2023}
}

Acknowledgment

This work was funded in part by DARPA Semafor and Sony. The views, opinions and/or findings expressed are those of the authors and should not be interpreted as representing the official views or policies of the Department of Defense or the U.S. Government.

IFICL / SLfM

readme

Sound Localization from Motion: Jointly Learning Sound Direction and Camera Rotation

Ziyang Chen, Shengyi Qian, Andrew Owens
University of Michigan, Ann Arbor
ICCV 2023

Environment

Datasets

LibriSpeech

Free Music Archive (FMA)

HM3D-SS

Demo Videos

Pretrained Models

Train & Evaluation

Citation

Acknowledgment

IFICL / SLfM

readme

Sound Localization from Motion: Jointly Learning Sound Direction and Camera Rotation

Ziyang Chen, Shengyi Qian, Andrew Owens University of Michigan, Ann Arbor ICCV 2023

Environment

Datasets

LibriSpeech

Free Music Archive (FMA)

HM3D-SS

Demo Videos

Pretrained Models

Train & Evaluation

Citation

Acknowledgment

Ziyang Chen, Shengyi Qian, Andrew Owens
University of Michigan, Ann Arbor
ICCV 2023