QingyuLiu0521/ICSD - Githubissues

ICSD: An Open-source Dataset for Infant Cry and Snoring Detection

This is the official repository for the ICSD dataset.

Please note that our paper is currently under review. After the paper is accepted, you can download the audio files and metadata from our provided source URL list on Huggingface

About ⭐️

🎤 ICSD is a comprehensive audio event dataset for infant cry and snoring detection with the following features:

containing over 3.3 hours of strongly labeled data and 1 hour of weakly labeled data;
containing foreground events and background events for generating synthetic data

The figure below shows the organized structure of the ICSD dataset where audio files are stored in the audio folder and event time-stamp annotations in the metadata folder, each further categorized into train, validation, and test subfolders. Moreover, source materials for generating synthetic strongly labeled data are also provided. You can use Scaper to generate your own synthetic data.

Detailed description for the dataset could be found in our [paper]().

To use the ICSD dataset, you can download the audio files and metada from our provided source URL list on HuggingFace.

Please note that ICSD doesn't own the copyright of the audios; the copyright remains with the original owners of the video or audio.

Data Preview 🔍

The demo folder provides four audio samples that you can download and listen to.

Baseline system 🖥️

We designed our baseline system based on DCASE 2023 Challenge task4

Requirements 📝

The script conda_create_environment.sh is available to create an environment which runs the baseline system.

Data Download ⬇️

You can download the ICSD dataset using the script: download_ICSD.py

Usage:

Visit our Hugging Face repository to request access permissions. After the review process, you will receive authorization to download the ICSD dataset.
After obtaining permission, Navigate to your Hugging Face settings to generate your personal token. Under 'Repositories permissions', enter datasets/QingyuLiu1/ICSD. For detailed information about tokens, please refer to the official documentation.

Run the command python download_ICSD.py --token=your_token. Here, your_token is the one you've generated from your Hugging Face settings. Following this, the ICSD dataset will be downloaded into the data folder and automatically unzipped.

If the script download_ICSD.py cannot run due to network issues, you can manually download the Dataset.zip from Hugging Face and unzip it into the data folder.
In addition to the well-organized dataset, we also provide source materials for generating synthetic strongly labeled data. You can download these by running the command python download_ICSD.py --token=your_token --file_name=Materials.zip --local_dir=your_folder.

Training 👨‍💻

Three baselines are provided:

baseline with only synthetic data
baseline with real data and synthetic data
baseline using pre-trained embedding

1. Baseline with only synthetic data

The baseline using the synthetic strongly labeled data can be run from scratch using the following command:

python train_sed.py

We provide a pretrained checkpoint. The baseline can be tested on the test set of the dataset using the following command:

python train_sed.py --test_from_checkpoint /path/to/synth_only.ckpt

2. Baseline with real data and synthetic data

The baseline using the real strongly labeled data and synthetic data can be run from scratch using the following command:

python train_sed.py --strong_real

The command will automatically considered the strong labeled data in the training process.

We provide a pretrained checkpoint. The baseline can be tested on the test set of the dataset using the following command:

python train_sed.py --test_from_checkpoint /path/to/hybrid.ckpt

3. Baseline using pre-trained embedding

We added a baseline which exploits the pre-trained model BEATs. It's an iterative self-supervised learning model designed to extract high-level non-speech audio semantics. The BEATs feature representations are integrated with the CNN output through a linear transformation and layer normalization, providing additional complementary information that can enhance sound event detection performance.

To run this system, you should first pre-compute the embeddings using the following command: python extract_embeddings.py --output_dir ./embeddings --pretrained_model "beats" Then, use the following command to run the system: train_pretrained.py

We provide a pretrained checkpoint. The baseline can be tested on the test set of the dataset using the following command:

python train_pretrained.py --test_from_checkpoint /path/to/BEATS.ckpt

Acknowledgement 🔔

We acknowledge the wonderful work by these excellent developers!

Audioset: agkphysics/AudioSet
Baby Chillanto Database
Donate A Cry: gveres/donateacry-corpus
Female and Male Snoring: orannahum/female-and-male-snoring
Snoring: tareqkhanemu/snoring
ESC-50: karolpiczak/ESC-50
SINS: KULeuvenADVISE/SINS_database
MUSAN: MUSAN-openslr.org
Scaper: justinsalamon/scaper

Reference 📖

If you use the ICSD dataset, please cite the following papers:

@article{ICSD,
      title={ICSD: An Open-source Dataset for Infant Cry and Snoring Detection},
      author={Qingyu Liu, Longfei Song, Dongxing Xu, Yanhua Long},
      journal={arXiv},
      volume={abs/2408.10561}
      year={2024}
}