CoEDL / elpis

🙊 software for creating speech recognition models.
https://elpis.readthedocs.io/en/latest/
Apache License 2.0
152 stars 33 forks source link

Remove duplicate audio files after resampling #152

Open benfoley opened 3 years ago

benfoley commented 3 years ago

To save disk space, delete the original files in state after they have been resampled.

Note the duplication of WAV files in this tree snippet:


├── datasets
│   └── 84405542abf527f524227e59030f998d
│       ├── annotations.json
│       ├── cleaned
│       ├── dataset.json
│       ├── original
│       │   ├── crdo-NRU_NUMPLUSCL_MH2_PEOPLE_1TO100_F4_3OCT2011_AUDIOPLUSEGG.eaf
│       │   ├── crdo-NRU_NUMPLUSCL_MH2_PEOPLE_1TO100_F4_3OCT2011_AUDIOPLUSEGG.wav
│       │   ├── crdo-NRU_NUMPLUSCL_MH2_PEOPLE_1TO100_F4_3OCT2011_AUDIOPLUSEGG.xml
│       │   ├── crdo-NRU_NUMPLUSCL_VERIFICATIONS_F4_13MARCH2009.eaf
│       │   ├── crdo-NRU_NUMPLUSCL_VERIFICATIONS_F4_13MARCH2009.wav
│       │   ├── crdo-NRU_NUMPLUSCL_VERIFICATIONS_F4_13MARCH2009.xml
│       │   └── text_corpora
│       ├── resampled
│       │   ├── crdo-NRU_NUMPLUSCL_MH2_PEOPLE_1TO100_F4_3OCT2011_AUDIOPLUSEGG.wav
│       │   └── crdo-NRU_NUMPLUSCL_VERIFICATIONS_F4_13MARCH2009.wav
│       ├── word_count.json
│       └── word_list.txt
├── interface.json
├── loggers
├── models
│   └── 896cc2f4f3f3927591c1a5018d1fa6ec```