This repository is for the paper DeepFry: Identifying Vocal Fry Using Deep Neural Networks by Bronya R. Chernyak, Talia Ben Simon, Yael Segal, Jeremy Steffman, Eleanor Chodroff, Jennifer S. Cole, Joseph Keshet.
It contains code for predicting creaky voice, as well as pre-trained models.
We provide two pre-trained models:
This repository enables you to identify creaky frames in a given audio, see details below.
conda env create -f environment.yml
pip install -r requirements.txt
There are two options to run this repository:
--cuda
--workers num_workers
|-- CustomDataDIR
| |-- test
| | |-- file1.wav
| | |-- file1.TextGrid
| | |-- file2.wav
| | |-- file2.TextGrid
and then you can specify the argument --data_dir CustomDataDir
This options allows you to test the repository. In the folder 'allstar' you will find wav files with their corresponding textgrids, which we used to test our model on, as specified in the paper.
Note that the results in the paper were reported for 20ms to have a proper comparison between methods, while our model was trained on 5ms, so the measures here might differ slighly.
python run.py --data_dir allstar --model_name model_path --cuda
python run.py --data_dir allstar --model_name model_path --out_dir out_path --cuda
where model_path
is the absolute path to the pre-trained model, and out_dir
is the path to the directory in which the textgrids will be saved to with the predictions of the model.
python run.py --data_dir data_path --model_name model_path --out_dir out_path --custom --cuda
Where model_path
and out_dir
is the same as above and data_path
is the absolute path to a directory with wav files in which creak should be identified.
python run.py --data_dir data_path --model_name model_path --out_dir out_path --cuda
Where model_path
and out_dir
is the same as above and data_path
is the absolute path to a directory with wav files in which creak should be identified alongside their corresponding textgrids.