QingyuLiu0521 / ICSD

ICSD Dataset
8 stars 0 forks source link

ICSD: An Open-source Dataset for Infant Cry and Snoring Detection

arXiv hf

This is the official repository for the ICSD dataset.

Please note that our paper is currently under review. After the paper is accepted, you can download the audio files and metadata from our provided source URL list on Huggingface

About ⭐️

🎀 ICSD is a comprehensive audio event dataset for infant cry and snoring detection with the following features:

The figure below shows the organized structure of the ICSD dataset where audio files are stored in the audio folder and event time-stamp annotations in the metadata folder, each further categorized into train, validation, and test subfolders. Moreover, source materials for generating synthetic strongly labeled data are also provided. You can use Scaper to generate your own synthetic data.

folder

Detailed description for the dataset could be found in our [paper]().

To use the ICSD dataset, you can download the audio files and metada from our provided source URL list on HuggingFace.

Please note that ICSD doesn't own the copyright of the audios; the copyright remains with the original owners of the video or audio.

Data Preview πŸ”

The demo folder provides four audio samples that you can download and listen to.

Baseline system πŸ–₯️

We designed our baseline system based on DCASE 2023 Challenge task4

Requirements πŸ“

The script conda_create_environment.sh is available to create an environment which runs the baseline system.

Data Download ⬇️

You can download the ICSD dataset using the script: download_ICSD.py

Usage:

  1. Visit our Hugging Face repository to request access permissions. After the review process, you will receive authorization to download the ICSD dataset.
  2. After obtaining permission, Navigate to your Hugging Face settings to generate your personal token. Under 'Repositories permissions', enter datasets/QingyuLiu1/ICSD. For detailed information about tokens, please refer to the official documentation.
RepositoriesPermissions
  1. Run the command python download_ICSD.py --token=your_token. Here, your_token is the one you've generated from your Hugging Face settings. Following this, the ICSD dataset will be downloaded into the data folder and automatically unzipped.

    If the script download_ICSD.py cannot run due to network issues, you can manually download the Dataset.zip from Hugging Face and unzip it into the data folder.

  2. In addition to the well-organized dataset, we also provide source materials for generating synthetic strongly labeled data. You can download these by running the command python download_ICSD.py --token=your_token --file_name=Materials.zip --local_dir=your_folder.

Training πŸ‘¨β€πŸ’»

Three baselines are provided:

1. Baseline with only synthetic data

The baseline using the synthetic strongly labeled data can be run from scratch using the following command:

python train_sed.py

We provide a pretrained checkpoint. The baseline can be tested on the test set of the dataset using the following command:

python train_sed.py --test_from_checkpoint /path/to/synth_only.ckpt

2. Baseline with real data and synthetic data

The baseline using the real strongly labeled data and synthetic data can be run from scratch using the following command:

python train_sed.py --strong_real

The command will automatically considered the strong labeled data in the training process.

We provide a pretrained checkpoint. The baseline can be tested on the test set of the dataset using the following command:

python train_sed.py --test_from_checkpoint /path/to/hybrid.ckpt

3. Baseline using pre-trained embedding

We added a baseline which exploits the pre-trained model BEATs. It's an iterative self-supervised learning model designed to extract high-level non-speech audio semantics. The BEATs feature representations are integrated with the CNN output through a linear transformation and layer normalization, providing additional complementary information that can enhance sound event detection performance.

To run this system, you should first pre-compute the embeddings using the following command: python extract_embeddings.py --output_dir ./embeddings --pretrained_model "beats" Then, use the following command to run the system: train_pretrained.py

We provide a pretrained checkpoint. The baseline can be tested on the test set of the dataset using the following command:

python train_pretrained.py --test_from_checkpoint /path/to/BEATS.ckpt

Acknowledgement πŸ””

We acknowledge the wonderful work by these excellent developers!

Reference πŸ“–

If you use the ICSD dataset, please cite the following papers:

@article{ICSD,
      title={ICSD: An Open-source Dataset for Infant Cry and Snoring Detection},
      author={Qingyu Liu, Longfei Song, Dongxing Xu, Yanhua Long},
      journal={arXiv},
      volume={abs/2408.10561}
      year={2024}
}