This is the official repository for the ICSD dataset.
Please note that our paper is currently under review. After the paper is accepted, you can download the audio files and metadata from our provided source URL list on Huggingface
π€ ICSD is a comprehensive audio event dataset for infant cry and snoring detection with the following features:
The figure below shows the organized structure of the ICSD dataset where audio files are stored in the audio folder and event time-stamp annotations in the metadata folder, each further categorized into train, validation, and test subfolders. Moreover, source materials for generating synthetic strongly labeled data are also provided. You can use Scaper to generate your own synthetic data.
Detailed description for the dataset could be found in our [paper]().
To use the ICSD dataset, you can download the audio files and metada from our provided source URL list on HuggingFace.
Please note that ICSD doesn't own the copyright of the audios; the copyright remains with the original owners of the video or audio.
The demo
folder provides four audio samples that you can download and listen to.
We designed our baseline system based on DCASE 2023 Challenge task4
Requirements π
The script
conda_create_environment.sh
is available to create an environment which runs the baseline system.
You can download the ICSD dataset using the script: download_ICSD.py
datasets/QingyuLiu1/ICSD
. For detailed information about tokens, please refer to the official documentation.Run the command python download_ICSD.py --token=your_token
. Here, your_token
is the one you've generated from your Hugging Face settings. Following this, the ICSD dataset will be downloaded into the data
folder and automatically unzipped.
If the script
download_ICSD.py
cannot run due to network issues, you can manually download the Dataset.zip from Hugging Face and unzip it into thedata
folder.
In addition to the well-organized dataset, we also provide source materials for generating synthetic strongly labeled data. You can download these by running the command python download_ICSD.py --token=your_token --file_name=Materials.zip --local_dir=your_folder
.
Three baselines are provided:
The baseline using the synthetic strongly labeled data can be run from scratch using the following command:
python train_sed.py
We provide a pretrained checkpoint. The baseline can be tested on the test set of the dataset using the following command:
python train_sed.py --test_from_checkpoint /path/to/synth_only.ckpt
The baseline using the real strongly labeled data and synthetic data can be run from scratch using the following command:
python train_sed.py --strong_real
The command will automatically considered the strong labeled data in the training process.
We provide a pretrained checkpoint. The baseline can be tested on the test set of the dataset using the following command:
python train_sed.py --test_from_checkpoint /path/to/hybrid.ckpt
We added a baseline which exploits the pre-trained model BEATs. It's an iterative self-supervised learning model designed to extract high-level non-speech audio semantics. The BEATs feature representations are integrated with the CNN output through a linear transformation and layer normalization, providing additional complementary information that can enhance sound event detection performance.
To run this system, you should first pre-compute the embeddings using the following command: python extract_embeddings.py --output_dir ./embeddings --pretrained_model "beats"
Then, use the following command to run the system:
train_pretrained.py
We provide a pretrained checkpoint. The baseline can be tested on the test set of the dataset using the following command:
python train_pretrained.py --test_from_checkpoint /path/to/BEATS.ckpt
We acknowledge the wonderful work by these excellent developers!
If you use the ICSD dataset, please cite the following papers:
@article{ICSD,
title={ICSD: An Open-source Dataset for Infant Cry and Snoring Detection},
author={Qingyu Liu, Longfei Song, Dongxing Xu, Yanhua Long},
journal={arXiv},
volume={abs/2408.10561}
year={2024}
}