EveryVoiceTTS / EveryVoice

The EveryVoice TTS Toolkit - Text To Speech for your language
https://docs.everyvoice.ca
Other
20 stars 2 forks source link

Selecting wrong filelist format causes uncaught KeyError #8

Closed roedoejet closed 1 year ago

roedoejet commented 1 year ago

if you are reading a psv file but call it a tsv file in the wizard, you could get KeyErrors, which should be handled in the Step's validation.

roedoejet commented 1 year ago

Actually, per @marctessier, we get both TypeErrors and AttributeErrors:

╭──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
│ Which of the following text transformations would like to apply before determining the symbol set?                                                                                  │
╰──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
╭─────────────────────────────── Traceback (most recent call last) ─────────────────────────────────
│ /gpfs/fs3c/nrc/uat/nrc_dt/tes001/EveryVoice/everyvoice/cli.py:93 in new_dataset                  │
│                                                                                                  │
│    90 def new_dataset():                                                                         │
│    91 │   from everyvoice.wizard.main_tour import WIZARD_TOUR                                    │
│    92 │                                                                                          │
│ ❱  93 │   WIZARD_TOUR.run()                                                                      │
│    94                                                                                            │
│    95                                                                                            │
│    96 # Add preprocess to root                                                                   │
│                                                                                                  │
│ /gpfs/fs3c/nrc/uat/nrc_dt/tes001/EveryVoice/everyvoice/wizard/__init__.py:143 in run             │
│                                                                                                  │
│   140 │                                                                                          │
│   141 │   def run(self):                                                                         │
│   142 │   │   for _, _, node in RenderTree(self.root):                                           │
│ ❱ 143 │   │   │   node.run()                                                                     │
│   144 │                                                                                          │
│   145 │   def visualize(self):                                                                   │
│   146 │   │   for pre, _, node in RenderTree(self.root):                                         │
│                                                                                                  │
│ /gpfs/fs3c/nrc/uat/nrc_dt/tes001/EveryVoice/everyvoice/wizard/__init__.py:99 in run              │
│                                                                                                  │
│    96 │   │   """Prompt the user and save the response to the response attribute.                │
│    97 │   │   If this method returns something truthy, continue, otherwise ask the prompt agai   │
│    98 │   │   """                                                                                │
│ ❱  99 │   │   self.response = self.prompt()                                                      │
│   100 │   │   if self.validate(self.response):                                                   │
│   101 │   │   │   self.completed = True                                                          │
│   102 │   │   │   try:                                                                           │
│                                                                                                  │
│ /gpfs/fs3c/nrc/uat/nrc_dt/tes001/EveryVoice/everyvoice/wizard/dataset.py:412 in prompt           │
│                                                                                                  │
│   409 │   │   )                                                                                  │
│   410 │   │   symbols_from_language = return_symbols(selected_language)                          │
│   411 │   │   all_tokens = None                                                                  │
│ ❱ 412 │   │   found_symbols = set(" ".join([x["text"] for x in self.state["filelist_data"]]))    │
│   413 │   │   if all_tokens is None:                                                             │
│   414 │   │   │   logger.info(                                                                   │
│   415 │   │   │   │   "We will now present all the symbols found in your data. You will have t   │
╰──────────────────────────────────────────────────────────────────────────────────────────────────╯
TypeError: sequence item 1: expected str instance, NoneType found

and:

────────────────────────────────────────────
2023-09-15 15:04:38.331 | INFO     | everyvoice.wizard.dataset:prompt:280 - Note: if your dataset has more than one language in it, you will have to add this information to your filelist, because the new dataset wizard can't guess!
Which of the following supported languages are in your dataset?
  [und]: my language isn't here
  [alq]: Algonquin
  [atj]: Atikamekw
  [ckt]: Chukchi
  [clm]: Klallam
  [crg-dv]: Michif
  [crg-tmd]: Michif
  [crj]: Southern East Cree
  [crk]: Plains Cree
  [crl]: Northern East Cree
  [crm]: Moose Cree
  [csw]: Swampy Cree
  [ctp]: Western Highland Chatino
  [dan]: Danish
> [eng]: English
  [fin]: Finnish
  [fra]: French
  [git]: Gitksan
  [gla]: Scottish Gaelic
  [gwi]: Gwich'in
  [haa]: Hän
  [ikt]: Western Inuktut
  [iku]: Inuktitut Syllabics
  [iku-sro]: Inuktitut Romanized
  [kkz]: Kaska
  [kwk-boas]: Kwak'wala (Boas orthography)
  [kwk-napa]: Kwak'wala (NAPA orthography)
  [kwk-umista]: Kwak'wala (U'mista orthography)
  [lml]: Raga
  [mic]: Mi'kmaq
  [moe]: Innu-aimun
  [moh]: Kanien'kéha
  [oji]: Ojibwe
  [oji-syl]: Ojibwe Syllabics
  [oka]: nsyilxcən
  [see]: Seneca
  [srs]: Tsuut'ina
  [str]: SENĆOŦEN
  [tau]: Upper Tanana
  [tce]: Southern Tutchone
  [tgx]: Tagish
  [tli]: Tlingit
  [ttm]: Northern Tutchone
  [und]: Undetermined
(Press "/" to search)
╭──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
│ Which of the following text transformations would like to apply before determining the symbol set?                                                                                  │
╰──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
Applying lowercase to data:   0%|                                                                                                                 | 1/13100 [00:00<00:00, 13573.80it/s]
╭─────────────────────────────── Traceback (most recent call last) ─────────────────────────────────
│ /gpfs/fs3c/nrc/uat/nrc_dt/tes001/EveryVoice/everyvoice/cli.py:93 in new_dataset                  │
│                                                                                                  │
│    90 def new_dataset():                                                                         │
│    91 │   from everyvoice.wizard.main_tour import WIZARD_TOUR                                    │
│    92 │                                                                                          │
│ ❱  93 │   WIZARD_TOUR.run()                                                                      │
│    94                                                                                            │
│    95                                                                                            │
│    96 # Add preprocess to root                                                                   │
│                                                                                                  │
│ /gpfs/fs3c/nrc/uat/nrc_dt/tes001/EveryVoice/everyvoice/wizard/__init__.py:143 in run             │
│                                                                                                  │
│   140 │                                                                                          │
│   141 │   def run(self):                                                                         │
│   142 │   │   for _, _, node in RenderTree(self.root):                                           │
│ ❱ 143 │   │   │   node.run()                                                                     │
│   144 │                                                                                          │
│   145 │   def visualize(self):                                                                   │
│   146 │   │   for pre, _, node in RenderTree(self.root):                                         │
│                                                                                                  │
│ /gpfs/fs3c/nrc/uat/nrc_dt/tes001/EveryVoice/everyvoice/wizard/__init__.py:107 in run             │
│                                                                                                  │
│   104 │   │   │   │   │   self.state[self.name] = self.response                                  │
│   105 │   │   │   │   return self.response                                                       │
│   106 │   │   │   finally:                                                                       │
│ ❱ 107 │   │   │   │   self.effect()                                                              │
│   108 │   │   else:                                                                              │
│   109 │   │   │   self.run()                                                                     │
│   110                                                                                            │
│                                                                                                  │
│ /gpfs/fs3c/nrc/uat/nrc_dt/tes001/EveryVoice/everyvoice/wizard/dataset.py:369 in effect           │
│                                                                                                  │
│   366 │   │   │   │   │   range(len(self.state["filelist_data"])),                               │
│   367 │   │   │   │   │   desc=f"Applying {process_lookup[process]['desc']} to data",            │
│   368 │   │   │   │   ):                                                                         │
│ ❱ 369 │   │   │   │   │   self.state["filelist_data"][i]["text"] = process_lookup[process][      │
│   370 │   │   │   │   │   │   "fn"                                                               │
│   371 │   │   │   │   │   ](self.state["filelist_data"][i]["text"])                              │
│   372                                                                                            │
│                                                                                                  │
│ /gpfs/fs3c/nrc/uat/nrc_dt/tes001/EveryVoice/everyvoice/wizard/dataset.py:360 in <lambda>         │
│                                                                                                  │
│   357 │                                                                                          │
│   358 │   def effect(self):                                                                      │
│   359 │   │   process_lookup = {                                                                 │
│ ❱ 360 │   │   │   0: {"fn": lambda x: x.lower(), "desc": "lowercase"},                           │
│   361 │   │   │   1: {"fn": lambda x: normalize("NFC", x), "desc": ""},                          │
│   362 │   │   }                                                                                  │
│   363 │   │   if self.response is not None and len(self.response):                               │
╰──────────────────────────────────────────────────────────────────────────────────────────────────╯
AttributeError: 'NoneType' object has no attribute 'lower'
joanise commented 1 year ago

Starting to work on this.