NVIDIA / NeMo

A scalable generative AI framework built for researchers and developers working on Large Language Models, Multimodal, and Speech AI (Automatic Speech Recognition and Text-to-Speech)
https://docs.nvidia.com/nemo-framework/user-guide/latest/overview.html
Apache License 2.0
12.11k stars 2.52k forks source link

Issue with text normalization HUI German Dataset #7539

Closed ken2190 closed 1 year ago

ken2190 commented 1 year ago

Describe the bug

I'm trying to processe text normalization for HUI German Dataset but i get error like below. Does anyone have idea to resolve this issue?

Steps/Code to reproduce bug

(nemo) ubuntu@HP:/mnt/e/tools/nemo$ python /mnt/e/tools/NeMo/get_data.py --data-root /mnt/f/hui_acg --manifests-root /mnt/f/hui_acg/ --normalize-text
[NeMo I 2023-09-27 13:31:45 get_data:105] Skipped downloading data because it exists: /mnt/f/hui_acg/HUI-Audio-Corpus-German-clean/datasetStatisticClean.zip
[NeMo I 2023-09-27 13:31:45 get_data:109] Unzipping data: /mnt/f/hui_acg/HUI-Audio-Corpus-German-clean/datasetStatisticClean.zip --> /mnt/f/hui_acg/HUI-Audio-Corpus-German-clean
[NeMo I 2023-09-27 13:32:21 get_data:111] Unzipping data is complete: /mnt/f/hui_acg/HUI-Audio-Corpus-German-clean/datasetStatisticClean.zip.
[NeMo I 2023-09-27 13:32:21 get_data:100] Downloading data: https://opendata.iisys.de/opendata/Datasets/HUI-Audio-Corpus-German/dataset_clean/Friedrich_Clean.zip --> /mnt/f/hui_acg/HUI-Audio-Corpus-German-clean/Friedrich_Clean.zip
[NeMo I 2023-09-27 13:32:21 get_data:105] Skipped downloading data because it exists: /mnt/f/hui_acg/HUI-Audio-Corpus-German-clean/Bernd_Ungerer_Clean.zip
[NeMo I 2023-09-27 13:32:21 get_data:105] Skipped downloading data because it exists: /mnt/f/hui_acg/HUI-Audio-Corpus-German-clean/Karlsson_Clean.zip
[NeMo I 2023-09-27 13:32:21 get_data:105] Skipped downloading data because it exists: /mnt/f/hui_acg/HUI-Audio-Corpus-German-clean/others_Clean.zip
[NeMo I 2023-09-27 13:32:21 get_data:105] Skipped downloading data because it exists: /mnt/f/hui_acg/HUI-Audio-Corpus-German-clean/Eva_K_Clean.zip
[NeMo I 2023-09-27 13:32:21 get_data:105] Skipped downloading data because it exists: /mnt/f/hui_acg/HUI-Audio-Corpus-German-clean/Hokuspokus_Clean.zip
[NeMo I 2023-09-27 13:48:43 get_data:109] Unzipping data: /mnt/f/hui_acg/HUI-Audio-Corpus-German-clean/Bernd_Ungerer_Clean.zip --> /mnt/f/hui_acg/HUI-Audio-Corpus-German-clean
[NeMo I 2023-09-27 13:48:43 get_data:109] Unzipping data: /mnt/f/hui_acg/HUI-Audio-Corpus-German-clean/Eva_K_Clean.zip --> /mnt/f/hui_acg/HUI-Audio-Corpus-German-clean
[NeMo I 2023-09-27 13:48:43 get_data:109] Unzipping data: /mnt/f/hui_acg/HUI-Audio-Corpus-German-clean/Friedrich_Clean.zip --> /mnt/f/hui_acg/HUI-Audio-Corpus-German-clean
[NeMo I 2023-09-27 13:48:43 get_data:109] Unzipping data: /mnt/f/hui_acg/HUI-Audio-Corpus-German-clean/Hokuspokus_Clean.zip --> /mnt/f/hui_acg/HUI-Audio-Corpus-German-clean
[NeMo I 2023-09-27 13:48:43 get_data:109] Unzipping data: /mnt/f/hui_acg/HUI-Audio-Corpus-German-clean/Karlsson_Clean.zip --> /mnt/f/hui_acg/HUI-Audio-Corpus-German-clean
[NeMo I 2023-09-27 13:48:43 get_data:109] Unzipping data: /mnt/f/hui_acg/HUI-Audio-Corpus-German-clean/others_Clean.zip --> /mnt/f/hui_acg/HUI-Audio-Corpus-German-clean
[NeMo I 2023-09-27 13:55:58 get_data:111] Unzipping data is complete: /mnt/f/hui_acg/HUI-Audio-Corpus-German-clean/Eva_K_Clean.zip.
[NeMo I 2023-09-27 13:56:16 get_data:111] Unzipping data is complete: /mnt/f/hui_acg/HUI-Audio-Corpus-German-clean/Friedrich_Clean.zip.
[NeMo I 2023-09-27 13:57:41 get_data:111] Unzipping data is complete: /mnt/f/hui_acg/HUI-Audio-Corpus-German-clean/Hokuspokus_Clean.zip.
[NeMo I 2023-09-27 14:24:01 get_data:111] Unzipping data is complete: /mnt/f/hui_acg/HUI-Audio-Corpus-German-clean/Karlsson_Clean.zip.
[NeMo I 2023-09-27 14:44:50 get_data:111] Unzipping data is complete: /mnt/f/hui_acg/HUI-Audio-Corpus-German-clean/others_Clean.zip.
[NeMo I 2023-09-27 14:55:32 get_data:111] Unzipping data is complete: /mnt/f/hui_acg/HUI-Audio-Corpus-German-clean/Bernd_Ungerer_Clean.zip.
[NeMo I 2023-09-27 14:55:32 get_data:263] Processing Speaker: Alexandra_Bogensperger
[NeMo I 2023-09-27 14:55:32 get_data:124] Preparing JSON split for speaker 1.
545it [00:00, 33618.58it/s]
[NeMo I 2023-09-27 14:55:32 get_data:161] Preparing JSON split for speaker 1 is complete.
[NeMo I 2023-09-27 14:55:32 get_data:263] Processing Speaker: Algy_Pug
[NeMo I 2023-09-27 14:55:32 get_data:124] Preparing JSON split for speaker 2.
141it [00:00, 33753.60it/s]
[NeMo I 2023-09-27 14:55:33 get_data:161] Preparing JSON split for speaker 2 is complete.
[NeMo I 2023-09-27 14:55:33 get_data:263] Processing Speaker: AliceDe
[NeMo I 2023-09-27 14:55:33 get_data:124] Preparing JSON split for speaker 3.
4it [00:00, 6700.17it/s]
[NeMo I 2023-09-27 14:55:33 get_data:161] Preparing JSON split for speaker 3 is complete.
[NeMo I 2023-09-27 14:55:33 get_data:263] Processing Speaker: Anastasiia_Solokha
[NeMo I 2023-09-27 14:55:33 get_data:124] Preparing JSON split for speaker 4.
4it [00:00, 9714.66it/s]
[NeMo I 2023-09-27 14:55:33 get_data:161] Preparing JSON split for speaker 4 is complete.
[NeMo I 2023-09-27 14:55:33 get_data:263] Processing Speaker: Anka
[NeMo I 2023-09-27 14:55:33 get_data:124] Preparing JSON split for speaker 5.
471it [00:00, 30448.79it/s]
[NeMo I 2023-09-27 14:55:33 get_data:161] Preparing JSON split for speaker 5 is complete.
[NeMo I 2023-09-27 14:55:33 get_data:263] Processing Speaker: Anna_Samrowski
[NeMo I 2023-09-27 14:55:33 get_data:124] Preparing JSON split for speaker 6.
5it [00:00, 8859.96it/s]
[NeMo W 2023-09-27 14:55:33 get_data:158] Skipped speaker 6. Not enough data for train, val and test.
[NeMo I 2023-09-27 14:55:33 get_data:263] Processing Speaker: Anna_Simon
[NeMo I 2023-09-27 14:55:33 get_data:124] Preparing JSON split for speaker 7.
19it [00:00, 20159.82it/s]
[NeMo I 2023-09-27 14:55:33 get_data:161] Preparing JSON split for speaker 7 is complete.
[NeMo I 2023-09-27 14:55:33 get_data:263] Processing Speaker: Anne
[NeMo I 2023-09-27 14:55:33 get_data:124] Preparing JSON split for speaker 8.
72it [00:00, 28945.64it/s]
[NeMo I 2023-09-27 14:55:33 get_data:161] Preparing JSON split for speaker 8 is complete.
[NeMo I 2023-09-27 14:55:33 get_data:263] Processing Speaker: Antoinette_Huting
[NeMo I 2023-09-27 14:55:33 get_data:124] Preparing JSON split for speaker 9.
55it [00:00, 25470.54it/s]
[NeMo I 2023-09-27 14:55:33 get_data:161] Preparing JSON split for speaker 9 is complete.
[NeMo I 2023-09-27 14:55:33 get_data:263] Processing Speaker: Anton
[NeMo I 2023-09-27 14:55:33 get_data:124] Preparing JSON split for speaker 10.
21it [00:00, 23227.95it/s]
[NeMo I 2023-09-27 14:55:33 get_data:161] Preparing JSON split for speaker 10 is complete.
[NeMo I 2023-09-27 14:55:33 get_data:263] Processing Speaker: Apneia
[NeMo I 2023-09-27 14:55:33 get_data:124] Preparing JSON split for speaker 11.
10it [00:00, 12136.30it/s]
[NeMo I 2023-09-27 14:55:33 get_data:161] Preparing JSON split for speaker 11 is complete.
[NeMo I 2023-09-27 14:55:33 get_data:263] Processing Speaker: Availle
[NeMo I 2023-09-27 14:55:33 get_data:124] Preparing JSON split for speaker 12.
532it [00:00, 29288.45it/s]
[NeMo I 2023-09-27 14:55:33 get_data:161] Preparing JSON split for speaker 12 is complete.
[NeMo I 2023-09-27 14:55:33 get_data:263] Processing Speaker: Bernd_Ungerer
[NeMo I 2023-09-27 14:55:33 get_data:124] Preparing JSON split for speaker 13.
32880it [00:01, 28967.69it/s]
[NeMo I 2023-09-27 14:55:34 get_data:161] Preparing JSON split for speaker 13 is complete.
[NeMo I 2023-09-27 14:55:34 get_data:263] Processing Speaker: Boris
[NeMo I 2023-09-27 14:55:34 get_data:124] Preparing JSON split for speaker 14.
257it [00:00, 26216.31it/s]
[NeMo I 2023-09-27 14:55:34 get_data:161] Preparing JSON split for speaker 14 is complete.
[NeMo I 2023-09-27 14:55:34 get_data:263] Processing Speaker: Capybara
[NeMo I 2023-09-27 14:55:34 get_data:124] Preparing JSON split for speaker 15.
383it [00:00, 35107.60it/s]
[NeMo I 2023-09-27 14:55:34 get_data:161] Preparing JSON split for speaker 15 is complete.
[NeMo I 2023-09-27 14:55:34 get_data:263] Processing Speaker: caromopfen
[NeMo I 2023-09-27 14:55:34 get_data:124] Preparing JSON split for speaker 16.
856it [00:00, 31935.28it/s]
[NeMo I 2023-09-27 14:55:34 get_data:161] Preparing JSON split for speaker 16 is complete.
[NeMo I 2023-09-27 14:55:34 get_data:263] Processing Speaker: Cate_Mackenzie
[NeMo I 2023-09-27 14:55:34 get_data:124] Preparing JSON split for speaker 17.
12it [00:00, 18497.48it/s]
[NeMo I 2023-09-27 14:55:34 get_data:161] Preparing JSON split for speaker 17 is complete.
[NeMo I 2023-09-27 14:55:34 get_data:263] Processing Speaker: Christian_Al-Kadi
[NeMo I 2023-09-27 14:55:34 get_data:124] Preparing JSON split for speaker 18.
408it [00:00, 30715.92it/s]
[NeMo I 2023-09-27 14:55:34 get_data:161] Preparing JSON split for speaker 18 is complete.
[NeMo I 2023-09-27 14:55:34 get_data:263] Processing Speaker: Christina_Lindgruen
[NeMo I 2023-09-27 14:55:34 get_data:124] Preparing JSON split for speaker 19.
16it [00:00, 23522.21it/s]
[NeMo I 2023-09-27 14:55:34 get_data:161] Preparing JSON split for speaker 19 is complete.
[NeMo I 2023-09-27 14:55:34 get_data:263] Processing Speaker: ClaudiaSterngucker
[NeMo I 2023-09-27 14:55:34 get_data:124] Preparing JSON split for speaker 20.
323it [00:00, 25071.44it/s]
[NeMo I 2023-09-27 14:55:34 get_data:161] Preparing JSON split for speaker 20 is complete.
[NeMo I 2023-09-27 14:55:34 get_data:263] Processing Speaker: ColOhr
[NeMo I 2023-09-27 14:55:34 get_data:124] Preparing JSON split for speaker 21.
93it [00:00, 25849.59it/s]
[NeMo I 2023-09-27 14:55:34 get_data:161] Preparing JSON split for speaker 21 is complete.
[NeMo I 2023-09-27 14:55:34 get_data:263] Processing Speaker: Crln_Yldz_Ksr
[NeMo I 2023-09-27 14:55:34 get_data:124] Preparing JSON split for speaker 22.
536it [00:00, 31930.28it/s]
[NeMo I 2023-09-27 14:55:35 get_data:161] Preparing JSON split for speaker 22 is complete.
[NeMo I 2023-09-27 14:55:35 get_data:263] Processing Speaker: DanielGrams
[NeMo I 2023-09-27 14:55:35 get_data:124] Preparing JSON split for speaker 23.
29it [00:00, 25425.34it/s]
[NeMo I 2023-09-27 14:55:35 get_data:161] Preparing JSON split for speaker 23 is complete.
[NeMo I 2023-09-27 14:55:35 get_data:263] Processing Speaker: danio
[NeMo I 2023-09-27 14:55:35 get_data:124] Preparing JSON split for speaker 24.
3it [00:00, 4822.89it/s]
[NeMo I 2023-09-27 14:55:35 get_data:161] Preparing JSON split for speaker 24 is complete.
[NeMo I 2023-09-27 14:55:35 get_data:263] Processing Speaker: Desirée_Löffler
[NeMo I 2023-09-27 14:55:35 get_data:124] Preparing JSON split for speaker 25.
18it [00:00, 21832.70it/s]
[NeMo I 2023-09-27 14:55:35 get_data:161] Preparing JSON split for speaker 25 is complete.
[NeMo I 2023-09-27 14:55:35 get_data:263] Processing Speaker: Dini_Steyn
[NeMo I 2023-09-27 14:55:35 get_data:124] Preparing JSON split for speaker 26.
2it [00:00, 3563.55it/s]
[NeMo W 2023-09-27 14:55:35 get_data:158] Skipped speaker 26. Not enough data for train, val and test.
[NeMo I 2023-09-27 14:55:35 get_data:263] Processing Speaker: Dirk_Weber
[NeMo I 2023-09-27 14:55:35 get_data:124] Preparing JSON split for speaker 27.
137it [00:00, 26192.89it/s]
[NeMo I 2023-09-27 14:55:35 get_data:161] Preparing JSON split for speaker 27 is complete.
[NeMo I 2023-09-27 14:55:35 get_data:263] Processing Speaker: DomBombadil
[NeMo I 2023-09-27 14:55:35 get_data:124] Preparing JSON split for speaker 28.
183it [00:00, 28871.83it/s]
[NeMo I 2023-09-27 14:55:35 get_data:161] Preparing JSON split for speaker 28 is complete.
[NeMo I 2023-09-27 14:55:35 get_data:263] Processing Speaker: Eki_Teebi
[NeMo I 2023-09-27 14:55:35 get_data:124] Preparing JSON split for speaker 29.
139it [00:00, 28998.17it/s]
[NeMo I 2023-09-27 14:55:35 get_data:161] Preparing JSON split for speaker 29 is complete.
[NeMo I 2023-09-27 14:55:35 get_data:263] Processing Speaker: ekyale
[NeMo I 2023-09-27 14:55:35 get_data:124] Preparing JSON split for speaker 30.
224it [00:00, 28817.11it/s]
[NeMo I 2023-09-27 14:55:35 get_data:161] Preparing JSON split for speaker 30 is complete.
[NeMo I 2023-09-27 14:55:35 get_data:263] Processing Speaker: Elli
[NeMo I 2023-09-27 14:55:35 get_data:124] Preparing JSON split for speaker 31.
286it [00:00, 25192.60it/s]
[NeMo I 2023-09-27 14:55:35 get_data:161] Preparing JSON split for speaker 31 is complete.
[NeMo I 2023-09-27 14:55:35 get_data:263] Processing Speaker: Eva_K
[NeMo I 2023-09-27 14:55:35 get_data:124] Preparing JSON split for speaker 32.
8539it [00:00, 22756.44it/s]
[NeMo I 2023-09-27 14:55:35 get_data:161] Preparing JSON split for speaker 32 is complete.
[NeMo I 2023-09-27 14:55:35 get_data:263] Processing Speaker: Fabian_Grant
[NeMo I 2023-09-27 14:55:35 get_data:124] Preparing JSON split for speaker 33.
32it [00:00, 31018.66it/s]
[NeMo I 2023-09-27 14:55:35 get_data:161] Preparing JSON split for speaker 33 is complete.
[NeMo I 2023-09-27 14:55:35 get_data:263] Processing Speaker: fantaeiner
[NeMo I 2023-09-27 14:55:35 get_data:124] Preparing JSON split for speaker 34.
200it [00:00, 31004.61it/s]
[NeMo I 2023-09-27 14:55:35 get_data:161] Preparing JSON split for speaker 34 is complete.
[NeMo I 2023-09-27 14:55:35 get_data:263] Processing Speaker: Franziska_Paul
[NeMo I 2023-09-27 14:55:35 get_data:124] Preparing JSON split for speaker 35.
60it [00:00, 28807.03it/s]
[NeMo I 2023-09-27 14:55:35 get_data:161] Preparing JSON split for speaker 35 is complete.
[NeMo I 2023-09-27 14:55:35 get_data:263] Processing Speaker: fremdschaemen
[NeMo I 2023-09-27 14:55:35 get_data:124] Preparing JSON split for speaker 36.
28it [00:00, 30432.89it/s]
[NeMo I 2023-09-27 14:55:35 get_data:161] Preparing JSON split for speaker 36 is complete.
[NeMo I 2023-09-27 14:55:35 get_data:263] Processing Speaker: Friedrich
[NeMo I 2023-09-27 14:55:35 get_data:124] Preparing JSON split for speaker 37.
9590it [00:00, 31844.18it/s]
[NeMo I 2023-09-27 14:55:36 get_data:161] Preparing JSON split for speaker 37 is complete.
[NeMo I 2023-09-27 14:55:36 get_data:263] Processing Speaker: Frown
[NeMo I 2023-09-27 14:55:36 get_data:124] Preparing JSON split for speaker 38.
209it [00:00, 28455.81it/s]
[NeMo I 2023-09-27 14:55:36 get_data:161] Preparing JSON split for speaker 38 is complete.
[NeMo I 2023-09-27 14:55:36 get_data:263] Processing Speaker: Gaby
[NeMo I 2023-09-27 14:55:36 get_data:124] Preparing JSON split for speaker 39.
7it [00:00, 12671.61it/s]
[NeMo I 2023-09-27 14:55:36 get_data:161] Preparing JSON split for speaker 39 is complete.
[NeMo I 2023-09-27 14:55:36 get_data:263] Processing Speaker: Gesine
[NeMo I 2023-09-27 14:55:36 get_data:124] Preparing JSON split for speaker 40.
1it [00:00, 2270.87it/s]
[NeMo W 2023-09-27 14:55:36 get_data:158] Skipped speaker 40. Not enough data for train, val and test.
[NeMo I 2023-09-27 14:55:36 get_data:263] Processing Speaker: heeheekitty
[NeMo I 2023-09-27 14:55:36 get_data:124] Preparing JSON split for speaker 41.
81it [00:00, 25623.25it/s]
[NeMo I 2023-09-27 14:55:36 get_data:161] Preparing JSON split for speaker 41 is complete.
[NeMo I 2023-09-27 14:55:36 get_data:263] Processing Speaker: Herman_Roskams
[NeMo I 2023-09-27 14:55:36 get_data:124] Preparing JSON split for speaker 42.
70it [00:00, 26438.66it/s]
[NeMo I 2023-09-27 14:55:36 get_data:161] Preparing JSON split for speaker 42 is complete.
[NeMo I 2023-09-27 14:55:36 get_data:263] Processing Speaker: Herr_Klugbeisser
[NeMo I 2023-09-27 14:55:36 get_data:124] Preparing JSON split for speaker 43.
43it [00:00, 18635.57it/s]
[NeMo I 2023-09-27 14:55:36 get_data:161] Preparing JSON split for speaker 43 is complete.
[NeMo I 2023-09-27 14:55:36 get_data:263] Processing Speaker: Hokuspokus
[NeMo I 2023-09-27 14:55:36 get_data:124] Preparing JSON split for speaker 44.
10584it [00:00, 31652.75it/s]
[NeMo I 2023-09-27 14:55:36 get_data:161] Preparing JSON split for speaker 44 is complete.
[NeMo I 2023-09-27 14:55:36 get_data:263] Processing Speaker: Igor_Teaforay
[NeMo I 2023-09-27 14:55:36 get_data:124] Preparing JSON split for speaker 45.
215it [00:00, 29591.63it/s]
[NeMo I 2023-09-27 14:55:36 get_data:161] Preparing JSON split for speaker 45 is complete.
[NeMo I 2023-09-27 14:55:36 get_data:263] Processing Speaker: Imke_Grassl
[NeMo I 2023-09-27 14:55:36 get_data:124] Preparing JSON split for speaker 46.
136it [00:00, 21508.44it/s]
[NeMo I 2023-09-27 14:55:36 get_data:161] Preparing JSON split for speaker 46 is complete.
[NeMo I 2023-09-27 14:55:37 get_data:263] Processing Speaker: Ingo_Breuer
[NeMo I 2023-09-27 14:55:37 get_data:124] Preparing JSON split for speaker 47.
114it [00:00, 23795.69it/s]
[NeMo I 2023-09-27 14:55:37 get_data:161] Preparing JSON split for speaker 47 is complete.
[NeMo I 2023-09-27 14:55:37 get_data:263] Processing Speaker: IvanDean
[NeMo I 2023-09-27 14:55:37 get_data:124] Preparing JSON split for speaker 48.
32it [00:00, 26854.29it/s]
[NeMo I 2023-09-27 14:55:37 get_data:161] Preparing JSON split for speaker 48 is complete.
[NeMo I 2023-09-27 14:55:37 get_data:263] Processing Speaker: Jessi
[NeMo I 2023-09-27 14:55:37 get_data:124] Preparing JSON split for speaker 49.
305it [00:00, 28271.00it/s]
[NeMo I 2023-09-27 14:55:37 get_data:161] Preparing JSON split for speaker 49 is complete.
[NeMo I 2023-09-27 14:55:37 get_data:263] Processing Speaker: Joe_Kay
[NeMo I 2023-09-27 14:55:37 get_data:124] Preparing JSON split for speaker 50.
35it [00:00, 18842.34it/s]
[NeMo I 2023-09-27 14:55:37 get_data:161] Preparing JSON split for speaker 50 is complete.
[NeMo I 2023-09-27 14:55:37 get_data:263] Processing Speaker: josimosi98
[NeMo I 2023-09-27 14:55:37 get_data:124] Preparing JSON split for speaker 51.
6it [00:00, 12912.17it/s]
[NeMo I 2023-09-27 14:55:37 get_data:161] Preparing JSON split for speaker 51 is complete.
[NeMo I 2023-09-27 14:55:37 get_data:263] Processing Speaker: Julia_Niedermaier
[NeMo I 2023-09-27 14:55:37 get_data:124] Preparing JSON split for speaker 52.
2012it [00:00, 29746.73it/s]
[NeMo I 2023-09-27 14:55:37 get_data:161] Preparing JSON split for speaker 52 is complete.
[NeMo I 2023-09-27 14:55:37 get_data:263] Processing Speaker: Kaktus
[NeMo I 2023-09-27 14:55:37 get_data:124] Preparing JSON split for speaker 53.
4it [00:00, 5797.24it/s]
[NeMo I 2023-09-27 14:55:37 get_data:161] Preparing JSON split for speaker 53 is complete.
[NeMo I 2023-09-27 14:55:37 get_data:263] Processing Speaker: Kalynda
[NeMo I 2023-09-27 14:55:37 get_data:124] Preparing JSON split for speaker 54.
394it [00:00, 22042.89it/s]
[NeMo I 2023-09-27 14:55:37 get_data:161] Preparing JSON split for speaker 54 is complete.
[NeMo I 2023-09-27 14:55:37 get_data:263] Processing Speaker: Kanta
[NeMo I 2023-09-27 14:55:37 get_data:124] Preparing JSON split for speaker 55.
3it [00:00, 5003.15it/s]
[NeMo I 2023-09-27 14:55:37 get_data:161] Preparing JSON split for speaker 55 is complete.
[NeMo I 2023-09-27 14:55:37 get_data:263] Processing Speaker: Kara_Shallenberg
[NeMo I 2023-09-27 14:55:37 get_data:124] Preparing JSON split for speaker 56.
52it [00:00, 26223.86it/s]
[NeMo I 2023-09-27 14:55:37 get_data:161] Preparing JSON split for speaker 56 is complete.
[NeMo I 2023-09-27 14:55:37 get_data:263] Processing Speaker: KarinM
[NeMo I 2023-09-27 14:55:37 get_data:124] Preparing JSON split for speaker 57.
307it [00:00, 20873.95it/s]
[NeMo I 2023-09-27 14:55:37 get_data:161] Preparing JSON split for speaker 57 is complete.
[NeMo I 2023-09-27 14:55:37 get_data:263] Processing Speaker: Karlsson
[NeMo I 2023-09-27 14:55:37 get_data:124] Preparing JSON split for speaker 58.
10736it [00:00, 30896.67it/s]
[NeMo I 2023-09-27 14:55:37 get_data:161] Preparing JSON split for speaker 58 is complete.
[NeMo I 2023-09-27 14:55:37 get_data:263] Processing Speaker: keltoi
[NeMo I 2023-09-27 14:55:37 get_data:124] Preparing JSON split for speaker 59.
860it [00:00, 26357.30it/s]
[NeMo I 2023-09-27 14:55:37 get_data:161] Preparing JSON split for speaker 59 is complete.
[NeMo I 2023-09-27 14:55:37 get_data:263] Processing Speaker: Klaus_Beutelspacher
[NeMo I 2023-09-27 14:55:37 get_data:124] Preparing JSON split for speaker 60.
12it [00:00, 16496.77it/s]
[NeMo I 2023-09-27 14:55:38 get_data:161] Preparing JSON split for speaker 60 is complete.
[NeMo I 2023-09-27 14:55:38 get_data:263] Processing Speaker: Klaus_Neubauer
[NeMo I 2023-09-27 14:55:38 get_data:124] Preparing JSON split for speaker 61.
758it [00:00, 28217.40it/s]
[NeMo I 2023-09-27 14:55:38 get_data:161] Preparing JSON split for speaker 61 is complete.
[NeMo I 2023-09-27 14:55:38 get_data:263] Processing Speaker: Knubbel
[NeMo I 2023-09-27 14:55:38 get_data:124] Preparing JSON split for speaker 62.
3it [00:00, 5182.42it/s]
[NeMo I 2023-09-27 14:55:38 get_data:161] Preparing JSON split for speaker 62 is complete.
[NeMo I 2023-09-27 14:55:38 get_data:263] Processing Speaker: Laila_Katinka
[NeMo I 2023-09-27 14:55:38 get_data:124] Preparing JSON split for speaker 63.
1it [00:00, 1679.06it/s]
[NeMo W 2023-09-27 14:55:38 get_data:158] Skipped speaker 63. Not enough data for train, val and test.
[NeMo I 2023-09-27 14:55:38 get_data:263] Processing Speaker: Larry_Greene
[NeMo I 2023-09-27 14:55:38 get_data:124] Preparing JSON split for speaker 64.
16it [00:00, 15705.33it/s]
[NeMo I 2023-09-27 14:55:38 get_data:161] Preparing JSON split for speaker 64 is complete.
[NeMo I 2023-09-27 14:55:38 get_data:263] Processing Speaker: Lars_Rolander_(1942-2016)
[NeMo I 2023-09-27 14:55:38 get_data:124] Preparing JSON split for speaker 65.
745it [00:00, 28759.04it/s]
[NeMo I 2023-09-27 14:55:38 get_data:161] Preparing JSON split for speaker 65 is complete.
[NeMo I 2023-09-27 14:55:38 get_data:263] Processing Speaker: Lektor
[NeMo I 2023-09-27 14:55:38 get_data:124] Preparing JSON split for speaker 66.
3it [00:00, 6801.57it/s]
[NeMo I 2023-09-27 14:55:38 get_data:161] Preparing JSON split for speaker 66 is complete.
[NeMo I 2023-09-27 14:55:38 get_data:263] Processing Speaker: leserchen
[NeMo I 2023-09-27 14:55:38 get_data:124] Preparing JSON split for speaker 67.
65it [00:00, 27876.25it/s]
[NeMo I 2023-09-27 14:55:38 get_data:161] Preparing JSON split for speaker 67 is complete.
[NeMo I 2023-09-27 14:55:38 get_data:263] Processing Speaker: LillY
[NeMo I 2023-09-27 14:55:38 get_data:124] Preparing JSON split for speaker 68.
7it [00:00, 11829.22it/s]
[NeMo I 2023-09-27 14:55:38 get_data:161] Preparing JSON split for speaker 68 is complete.
[NeMo I 2023-09-27 14:55:38 get_data:263] Processing Speaker: lorda
[NeMo I 2023-09-27 14:55:38 get_data:124] Preparing JSON split for speaker 69.
195it [00:00, 28319.29it/s]
[NeMo I 2023-09-27 14:55:38 get_data:161] Preparing JSON split for speaker 69 is complete.
[NeMo I 2023-09-27 14:55:38 get_data:263] Processing Speaker: LordOider
[NeMo I 2023-09-27 14:55:38 get_data:124] Preparing JSON split for speaker 70.
21it [00:00, 9642.08it/s]
[NeMo I 2023-09-27 14:55:38 get_data:161] Preparing JSON split for speaker 70 is complete.
[NeMo I 2023-09-27 14:55:38 get_data:263] Processing Speaker: LyricalWB
[NeMo I 2023-09-27 14:55:38 get_data:124] Preparing JSON split for speaker 71.
479it [00:00, 25347.54it/s]
[NeMo I 2023-09-27 14:55:38 get_data:161] Preparing JSON split for speaker 71 is complete.
[NeMo I 2023-09-27 14:55:38 get_data:263] Processing Speaker: manuwolf
[NeMo I 2023-09-27 14:55:38 get_data:124] Preparing JSON split for speaker 72.
9it [00:00, 14485.32it/s]
[NeMo I 2023-09-27 14:55:38 get_data:161] Preparing JSON split for speaker 72 is complete.
[NeMo I 2023-09-27 14:55:38 get_data:263] Processing Speaker: marham63
[NeMo I 2023-09-27 14:55:38 get_data:124] Preparing JSON split for speaker 73.
1877it [00:00, 32591.92it/s]
[NeMo I 2023-09-27 14:55:38 get_data:161] Preparing JSON split for speaker 73 is complete.
[NeMo I 2023-09-27 14:55:38 get_data:263] Processing Speaker: Markus_Wachenheim
[NeMo I 2023-09-27 14:55:38 get_data:124] Preparing JSON split for speaker 74.
91it [00:00, 23905.90it/s]
[NeMo I 2023-09-27 14:55:38 get_data:161] Preparing JSON split for speaker 74 is complete.
[NeMo I 2023-09-27 14:55:38 get_data:263] Processing Speaker: Martin_Harbecke
[NeMo I 2023-09-27 14:55:38 get_data:124] Preparing JSON split for speaker 75.
102it [00:00, 22139.26it/s]
[NeMo I 2023-09-27 14:55:38 get_data:161] Preparing JSON split for speaker 75 is complete.
[NeMo I 2023-09-27 14:55:38 get_data:263] Processing Speaker: Mat
[NeMo I 2023-09-27 14:55:38 get_data:124] Preparing JSON split for speaker 76.
1it [00:00, 2383.13it/s]
[NeMo W 2023-09-27 14:55:38 get_data:158] Skipped speaker 76. Not enough data for train, val and test.
[NeMo I 2023-09-27 14:55:38 get_data:263] Processing Speaker: Matze
[NeMo I 2023-09-27 14:55:38 get_data:124] Preparing JSON split for speaker 77.
65it [00:00, 17870.33it/s]
[NeMo I 2023-09-27 14:55:38 get_data:161] Preparing JSON split for speaker 77 is complete.
[NeMo I 2023-09-27 14:55:38 get_data:263] Processing Speaker: melaniesandra
[NeMo I 2023-09-27 14:55:38 get_data:124] Preparing JSON split for speaker 78.
24it [00:00, 23481.06it/s]
[NeMo I 2023-09-27 14:55:38 get_data:161] Preparing JSON split for speaker 78 is complete.
[NeMo I 2023-09-27 14:55:38 get_data:263] Processing Speaker: merendo07
[NeMo I 2023-09-27 14:55:38 get_data:124] Preparing JSON split for speaker 79.
10it [00:00, 15851.49it/s]
[NeMo I 2023-09-27 14:55:38 get_data:161] Preparing JSON split for speaker 79 is complete.
[NeMo I 2023-09-27 14:55:38 get_data:263] Processing Speaker: mindfulheart
[NeMo I 2023-09-27 14:55:38 get_data:124] Preparing JSON split for speaker 80.
3it [00:00, 6253.93it/s]
[NeMo W 2023-09-27 14:55:38 get_data:158] Skipped speaker 80. Not enough data for train, val and test.
[NeMo I 2023-09-27 14:55:38 get_data:263] Processing Speaker: Monika_M._C
[NeMo I 2023-09-27 14:55:38 get_data:124] Preparing JSON split for speaker 81.
326it [00:00, 31313.68it/s]
[NeMo I 2023-09-27 14:55:38 get_data:161] Preparing JSON split for speaker 81 is complete.
[NeMo I 2023-09-27 14:55:38 get_data:263] Processing Speaker: njall
[NeMo I 2023-09-27 14:55:38 get_data:124] Preparing JSON split for speaker 82.
5it [00:00, 7492.50it/s]
[NeMo I 2023-09-27 14:55:39 get_data:161] Preparing JSON split for speaker 82 is complete.
[NeMo I 2023-09-27 14:55:39 get_data:263] Processing Speaker: noonday
[NeMo I 2023-09-27 14:55:39 get_data:124] Preparing JSON split for speaker 83.
1it [00:00, 1560.96it/s]
[NeMo W 2023-09-27 14:55:39 get_data:158] Skipped speaker 83. Not enough data for train, val and test.
[NeMo I 2023-09-27 14:55:39 get_data:263] Processing Speaker: Ohrbuch
[NeMo I 2023-09-27 14:55:39 get_data:124] Preparing JSON split for speaker 84.
663it [00:00, 27358.73it/s]
[NeMo I 2023-09-27 14:55:39 get_data:161] Preparing JSON split for speaker 84 is complete.
[NeMo I 2023-09-27 14:55:39 get_data:263] Processing Speaker: OldZach
[NeMo I 2023-09-27 14:55:39 get_data:124] Preparing JSON split for speaker 85.
125it [00:00, 19049.09it/s]
[NeMo I 2023-09-27 14:55:39 get_data:161] Preparing JSON split for speaker 85 is complete.
[NeMo I 2023-09-27 14:55:39 get_data:263] Processing Speaker: Orsina
[NeMo I 2023-09-27 14:55:39 get_data:124] Preparing JSON split for speaker 86.
6it [00:00, 8425.12it/s]
[NeMo I 2023-09-27 14:55:39 get_data:161] Preparing JSON split for speaker 86 is complete.
[NeMo I 2023-09-27 14:55:39 get_data:263] Processing Speaker: PeWaOt
[NeMo I 2023-09-27 14:55:39 get_data:124] Preparing JSON split for speaker 87.
231it [00:00, 19847.27it/s]
[NeMo I 2023-09-27 14:55:39 get_data:161] Preparing JSON split for speaker 87 is complete.
[NeMo I 2023-09-27 14:55:39 get_data:263] Processing Speaker: Ragnar
[NeMo I 2023-09-27 14:55:39 get_data:124] Preparing JSON split for speaker 88.
330it [00:00, 16557.06it/s]
[NeMo I 2023-09-27 14:55:39 get_data:161] Preparing JSON split for speaker 88 is complete.
[NeMo I 2023-09-27 14:55:39 get_data:263] Processing Speaker: Rainer
[NeMo I 2023-09-27 14:55:39 get_data:124] Preparing JSON split for speaker 89.
94it [00:00, 23163.42it/s]
[NeMo I 2023-09-27 14:55:39 get_data:161] Preparing JSON split for speaker 89 is complete.
[NeMo I 2023-09-27 14:55:39 get_data:263] Processing Speaker: Ralf
[NeMo I 2023-09-27 14:55:39 get_data:124] Preparing JSON split for speaker 90.
1it [00:00, 2559.06it/s]
[NeMo W 2023-09-27 14:55:39 get_data:158] Skipped speaker 90. Not enough data for train, val and test.
[NeMo I 2023-09-27 14:55:39 get_data:263] Processing Speaker: Ramona_Deininger-Schnabel
[NeMo I 2023-09-27 14:55:39 get_data:124] Preparing JSON split for speaker 91.
1482it [00:00, 28377.02it/s]
[NeMo I 2023-09-27 14:55:39 get_data:161] Preparing JSON split for speaker 91 is complete.
[NeMo I 2023-09-27 14:55:39 get_data:263] Processing Speaker: Rebecca_Braunert-Plunkett
[NeMo I 2023-09-27 14:55:39 get_data:124] Preparing JSON split for speaker 92.
667it [00:00, 27243.96it/s]
[NeMo I 2023-09-27 14:55:39 get_data:161] Preparing JSON split for speaker 92 is complete.
[NeMo I 2023-09-27 14:55:39 get_data:263] Processing Speaker: RenateIngrid
[NeMo I 2023-09-27 14:55:39 get_data:124] Preparing JSON split for speaker 93.
1274it [00:00, 27668.83it/s]
[NeMo I 2023-09-27 14:55:39 get_data:161] Preparing JSON split for speaker 93 is complete.
[NeMo I 2023-09-27 14:55:39 get_data:263] Processing Speaker: Rhigma
[NeMo I 2023-09-27 14:55:39 get_data:124] Preparing JSON split for speaker 94.
108it [00:00, 21351.10it/s]
[NeMo I 2023-09-27 14:55:39 get_data:161] Preparing JSON split for speaker 94 is complete.
[NeMo I 2023-09-27 14:55:39 get_data:263] Processing Speaker: Robert_Steiner
[NeMo I 2023-09-27 14:55:39 get_data:124] Preparing JSON split for speaker 95.
212it [00:00, 24120.23it/s]
[NeMo I 2023-09-27 14:55:39 get_data:161] Preparing JSON split for speaker 95 is complete.
[NeMo I 2023-09-27 14:55:39 get_data:263] Processing Speaker: Rogthey
[NeMo I 2023-09-27 14:55:39 get_data:124] Preparing JSON split for speaker 96.
960it [00:00, 25748.05it/s]
[NeMo I 2023-09-27 14:55:39 get_data:161] Preparing JSON split for speaker 96 is complete.
[NeMo I 2023-09-27 14:55:39 get_data:263] Processing Speaker: Sandra_Schmit
[NeMo I 2023-09-27 14:55:39 get_data:124] Preparing JSON split for speaker 97.
95it [00:00, 26779.95it/s]
[NeMo I 2023-09-27 14:55:39 get_data:161] Preparing JSON split for speaker 97 is complete.
[NeMo I 2023-09-27 14:55:39 get_data:263] Processing Speaker: Sascha
[NeMo I 2023-09-27 14:55:39 get_data:124] Preparing JSON split for speaker 98.
5it [00:00, 9035.55it/s]
[NeMo I 2023-09-27 14:55:39 get_data:161] Preparing JSON split for speaker 98 is complete.
[NeMo I 2023-09-27 14:55:39 get_data:263] Processing Speaker: schrm
[NeMo I 2023-09-27 14:55:39 get_data:124] Preparing JSON split for speaker 99.
385it [00:00, 28601.41it/s]
[NeMo I 2023-09-27 14:55:39 get_data:161] Preparing JSON split for speaker 99 is complete.
[NeMo I 2023-09-27 14:55:39 get_data:263] Processing Speaker: Sebastian
[NeMo I 2023-09-27 14:55:39 get_data:124] Preparing JSON split for speaker 100.
125it [00:00, 29896.11it/s]
[NeMo I 2023-09-27 14:55:39 get_data:161] Preparing JSON split for speaker 100 is complete.
[NeMo I 2023-09-27 14:55:39 get_data:263] Processing Speaker: Sellafield
[NeMo I 2023-09-27 14:55:39 get_data:124] Preparing JSON split for speaker 101.
10it [00:00, 15905.59it/s]
[NeMo I 2023-09-27 14:55:39 get_data:161] Preparing JSON split for speaker 101 is complete.
[NeMo I 2023-09-27 14:55:40 get_data:263] Processing Speaker: Shanty
[NeMo I 2023-09-27 14:55:40 get_data:124] Preparing JSON split for speaker 102.
28it [00:00, 27242.06it/s]
[NeMo I 2023-09-27 14:55:40 get_data:161] Preparing JSON split for speaker 102 is complete.
[NeMo I 2023-09-27 14:55:40 get_data:263] Processing Speaker: Silke_Britz
[NeMo I 2023-09-27 14:55:40 get_data:124] Preparing JSON split for speaker 103.
45it [00:00, 21924.00it/s]
[NeMo I 2023-09-27 14:55:40 get_data:161] Preparing JSON split for speaker 103 is complete.
[NeMo I 2023-09-27 14:55:40 get_data:263] Processing Speaker: Silmaryllis
[NeMo I 2023-09-27 14:55:40 get_data:124] Preparing JSON split for speaker 104.
336it [00:00, 14878.76it/s]
[NeMo I 2023-09-27 14:55:40 get_data:161] Preparing JSON split for speaker 104 is complete.
[NeMo I 2023-09-27 14:55:40 get_data:263] Processing Speaker: Sonia
[NeMo I 2023-09-27 14:55:40 get_data:124] Preparing JSON split for speaker 105.
473it [00:00, 30093.38it/s]
[NeMo I 2023-09-27 14:55:40 get_data:161] Preparing JSON split for speaker 105 is complete.
[NeMo I 2023-09-27 14:55:40 get_data:263] Processing Speaker: Sonja
[NeMo I 2023-09-27 14:55:40 get_data:124] Preparing JSON split for speaker 106.
130it [00:00, 23313.64it/s]
[NeMo I 2023-09-27 14:55:40 get_data:161] Preparing JSON split for speaker 106 is complete.
[NeMo I 2023-09-27 14:55:40 get_data:263] Processing Speaker: storylines
[NeMo I 2023-09-27 14:55:40 get_data:124] Preparing JSON split for speaker 107.
21it [00:00, 26804.74it/s]
[NeMo I 2023-09-27 14:55:40 get_data:161] Preparing JSON split for speaker 107 is complete.
[NeMo I 2023-09-27 14:55:40 get_data:263] Processing Speaker: Tabea
[NeMo I 2023-09-27 14:55:40 get_data:124] Preparing JSON split for speaker 108.
13it [00:00, 18958.95it/s]
[NeMo I 2023-09-27 14:55:40 get_data:161] Preparing JSON split for speaker 108 is complete.
[NeMo I 2023-09-27 14:55:40 get_data:263] Processing Speaker: Tanja_Ben_Jeroud
[NeMo I 2023-09-27 14:55:40 get_data:124] Preparing JSON split for speaker 109.
105it [00:00, 24909.61it/s]
[NeMo I 2023-09-27 14:55:40 get_data:161] Preparing JSON split for speaker 109 is complete.
[NeMo I 2023-09-27 14:55:40 get_data:263] Processing Speaker: thinkofelephants
[NeMo I 2023-09-27 14:55:40 get_data:124] Preparing JSON split for speaker 110.
6it [00:00, 9935.19it/s]
[NeMo I 2023-09-27 14:55:40 get_data:161] Preparing JSON split for speaker 110 is complete.
[NeMo I 2023-09-27 14:55:40 get_data:263] Processing Speaker: Traxxo
[NeMo I 2023-09-27 14:55:40 get_data:124] Preparing JSON split for speaker 111.
96it [00:00, 26051.58it/s]
[NeMo I 2023-09-27 14:55:40 get_data:161] Preparing JSON split for speaker 111 is complete.
[NeMo I 2023-09-27 14:55:40 get_data:263] Processing Speaker: Ute2013
[NeMo I 2023-09-27 14:55:40 get_data:124] Preparing JSON split for speaker 112.
64it [00:00, 21387.58it/s]
[NeMo I 2023-09-27 14:55:40 get_data:161] Preparing JSON split for speaker 112 is complete.
[NeMo I 2023-09-27 14:55:40 get_data:263] Processing Speaker: Verena
[NeMo I 2023-09-27 14:55:40 get_data:124] Preparing JSON split for speaker 113.
58it [00:00, 21884.64it/s]
[NeMo I 2023-09-27 14:55:40 get_data:161] Preparing JSON split for speaker 113 is complete.
[NeMo I 2023-09-27 14:55:40 get_data:263] Processing Speaker: Victoria_Asztaller
[NeMo I 2023-09-27 14:55:40 get_data:124] Preparing JSON split for speaker 114.
165it [00:00, 29841.76it/s]
[NeMo I 2023-09-27 14:55:40 get_data:161] Preparing JSON split for speaker 114 is complete.
[NeMo I 2023-09-27 14:55:40 get_data:263] Processing Speaker: Wolfgang
[NeMo I 2023-09-27 14:55:40 get_data:124] Preparing JSON split for speaker 115.
60it [00:00, 26302.07it/s]
[NeMo I 2023-09-27 14:55:40 get_data:161] Preparing JSON split for speaker 115 is complete.
[NeMo I 2023-09-27 14:55:40 get_data:263] Processing Speaker: Zach_K
[NeMo I 2023-09-27 14:55:40 get_data:124] Preparing JSON split for speaker 116.
18it [00:00, 24020.83it/s]
[NeMo I 2023-09-27 14:55:40 get_data:161] Preparing JSON split for speaker 116 is complete.
[NeMo I 2023-09-27 14:55:40 get_data:263] Processing Speaker: Zieraffe
[NeMo I 2023-09-27 14:55:40 get_data:124] Preparing JSON split for speaker 117.
33it [00:00, 18107.28it/s]
[NeMo I 2023-09-27 14:55:40 get_data:161] Preparing JSON split for speaker 117 is complete.
[NeMo I 2023-09-27 14:55:40 get_data:263] Processing Speaker: Zue_Von_Zob
[NeMo I 2023-09-27 14:55:40 get_data:124] Preparing JSON split for speaker 118.
11it [00:00, 13261.67it/s]
[NeMo I 2023-09-27 14:55:40 get_data:161] Preparing JSON split for speaker 118 is complete.
[NeMo I 2023-09-27 14:55:40 get_data:298] Saving Speaker to ID mapping to /mnt/f/hui_acg/spk2id.csv.
[NeMo I 2023-09-27 14:55:40 get_data:115] Saving JSON split to /mnt/f/hui_acg/train_manifest.json.
[NeMo I 2023-09-27 14:55:42 get_data:115] Saving JSON split to /mnt/f/hui_acg/val_manifest.json.
[NeMo I 2023-09-27 14:55:42 get_data:115] Saving JSON split to /mnt/f/hui_acg/test_manifest.json.
[NeMo I 2023-09-27 14:56:17 get_data:196] Normalizing text for /mnt/f/hui_acg/train_manifest.json.
 10%|███████▌                                                                    | 8692/86811 [28:47<7:53:04,  2.75it/s]joblib.externals.loky.process_executor._RemoteTraceback:
"""
Traceback (most recent call last):
  File "/home/ubuntu/miniconda3/envs/nemo/lib/python3.10/site-packages/joblib/_parallel_backends.py", line 273, in _wrap_func_call
    return func()
  File "/home/ubuntu/miniconda3/envs/nemo/lib/python3.10/site-packages/joblib/parallel.py", line 588, in __call__
    return [func(*args, **kwargs)
  File "/home/ubuntu/miniconda3/envs/nemo/lib/python3.10/site-packages/joblib/parallel.py", line 588, in <listcomp>
    return [func(*args, **kwargs)
  File "/mnt/e/tools/NeMo/get_data.py", line 192, in add_normalized_text
    normalized_text = normalizer_call(line_dict["text"])
  File "/mnt/e/tools/NeMo/get_data.py", line 189, in normalizer_call
    return text_normalizer.normalize(x, **text_normalizer_call_kwargs)
  File "/home/ubuntu/miniconda3/envs/nemo/lib/python3.10/site-packages/nemo_text_processing/text_normalization/normalize.py", line 328, in normalize
    tokens = self.parser.parse()
  File "/home/ubuntu/miniconda3/envs/nemo/lib/python3.10/site-packages/nemo_text_processing/text_normalization/token_parser.py", line 53, in parse
    token = self.parse_token()
  File "/home/ubuntu/miniconda3/envs/nemo/lib/python3.10/site-packages/nemo_text_processing/text_normalization/token_parser.py", line 76, in parse_token
    value = self.parse_token_value()
  File "/home/ubuntu/miniconda3/envs/nemo/lib/python3.10/site-packages/nemo_text_processing/text_normalization/token_parser.py", line 98, in parse_token_value
    list_token_dicts = self.parse()
  File "/home/ubuntu/miniconda3/envs/nemo/lib/python3.10/site-packages/nemo_text_processing/text_normalization/token_parser.py", line 53, in parse
    token = self.parse_token()
  File "/home/ubuntu/miniconda3/envs/nemo/lib/python3.10/site-packages/nemo_text_processing/text_normalization/token_parser.py", line 67, in parse_token
    key = self.parse_string_key()
  File "/home/ubuntu/miniconda3/envs/nemo/lib/python3.10/site-packages/nemo_text_processing/text_normalization/token_parser.py", line 141, in parse_string_key
    assert self.char not in string.whitespace and self.char != EOS
AssertionError
"""

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/mnt/e/tools/NeMo/get_data.py", line 316, in <module>
    main()
  File "/mnt/e/tools/NeMo/get_data.py", line 310, in main
    __text_normalization(train_json, args.num_workers)
  File "/mnt/e/tools/NeMo/get_data.py", line 201, in __text_normalization
    dict_list = Parallel(n_jobs=num_workers, backend="threading")(
  File "/home/ubuntu/miniconda3/envs/nemo/lib/python3.10/site-packages/joblib/parallel.py", line 1944, in __call__
    return output if self.return_generator else list(output)
  File "/home/ubuntu/miniconda3/envs/nemo/lib/python3.10/site-packages/joblib/parallel.py", line 1587, in _get_outputs
    yield from self._retrieve()
  File "/home/ubuntu/miniconda3/envs/nemo/lib/python3.10/site-packages/joblib/parallel.py", line 1691, in _retrieve
    self._raise_error_fast()
  File "/home/ubuntu/miniconda3/envs/nemo/lib/python3.10/site-packages/joblib/parallel.py", line 1726, in _raise_error_fast
    error_job.get_result(self.timeout)
  File "/home/ubuntu/miniconda3/envs/nemo/lib/python3.10/site-packages/joblib/parallel.py", line 735, in get_result
    return self._return_or_raise()
  File "/home/ubuntu/miniconda3/envs/nemo/lib/python3.10/site-packages/joblib/parallel.py", line 753, in _return_or_raise
    raise self._result
AssertionError
 10%|███████▋                                                                    | 8735/86811 [28:50<4:17:48,  5.05it/s]

Environment overview

Environment details

Additional context GPU model: RTX3090

ken2190 commented 1 year ago

I tried to add arg --num-workers 1 and it worked.