Selecting wrong filelist format causes uncaught KeyError

Actually, per @marctessier, we get both TypeErrors and AttributeErrors:

╭──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
│ Which of the following text transformations would like to apply before determining the symbol set?                                                                                  │
╰──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
╭─────────────────────────────── Traceback (most recent call last) ─────────────────────────────────
│ /gpfs/fs3c/nrc/uat/nrc_dt/tes001/EveryVoice/everyvoice/cli.py:93 in new_dataset                  │
│                                                                                                  │
│    90 def new_dataset():                                                                         │
│    91 │   from everyvoice.wizard.main_tour import WIZARD_TOUR                                    │
│    92 │                                                                                          │
│ ❱  93 │   WIZARD_TOUR.run()                                                                      │
│    94                                                                                            │
│    95                                                                                            │
│    96 # Add preprocess to root                                                                   │
│                                                                                                  │
│ /gpfs/fs3c/nrc/uat/nrc_dt/tes001/EveryVoice/everyvoice/wizard/__init__.py:143 in run             │
│                                                                                                  │
│   140 │                                                                                          │
│   141 │   def run(self):                                                                         │
│   142 │   │   for _, _, node in RenderTree(self.root):                                           │
│ ❱ 143 │   │   │   node.run()                                                                     │
│   144 │                                                                                          │
│   145 │   def visualize(self):                                                                   │
│   146 │   │   for pre, _, node in RenderTree(self.root):                                         │
│                                                                                                  │
│ /gpfs/fs3c/nrc/uat/nrc_dt/tes001/EveryVoice/everyvoice/wizard/__init__.py:99 in run              │
│                                                                                                  │
│    96 │   │   """Prompt the user and save the response to the response attribute.                │
│    97 │   │   If this method returns something truthy, continue, otherwise ask the prompt agai   │
│    98 │   │   """                                                                                │
│ ❱  99 │   │   self.response = self.prompt()                                                      │
│   100 │   │   if self.validate(self.response):                                                   │
│   101 │   │   │   self.completed = True                                                          │
│   102 │   │   │   try:                                                                           │
│                                                                                                  │
│ /gpfs/fs3c/nrc/uat/nrc_dt/tes001/EveryVoice/everyvoice/wizard/dataset.py:412 in prompt           │
│                                                                                                  │
│   409 │   │   )                                                                                  │
│   410 │   │   symbols_from_language = return_symbols(selected_language)                          │
│   411 │   │   all_tokens = None                                                                  │
│ ❱ 412 │   │   found_symbols = set(" ".join([x["text"] for x in self.state["filelist_data"]]))    │
│   413 │   │   if all_tokens is None:                                                             │
│   414 │   │   │   logger.info(                                                                   │
│   415 │   │   │   │   "We will now present all the symbols found in your data. You will have t   │
╰──────────────────────────────────────────────────────────────────────────────────────────────────╯
TypeError: sequence item 1: expected str instance, NoneType found

and:

────────────────────────────────────────────
2023-09-15 15:04:38.331 | INFO     | everyvoice.wizard.dataset:prompt:280 - Note: if your dataset has more than one language in it, you will have to add this information to your filelist, because the new dataset wizard can't guess!
Which of the following supported languages are in your dataset?
  [und]: my language isn't here
  [alq]: Algonquin
  [atj]: Atikamekw
  [ckt]: Chukchi
  [clm]: Klallam
  [crg-dv]: Michif
  [crg-tmd]: Michif
  [crj]: Southern East Cree
  [crk]: Plains Cree
  [crl]: Northern East Cree
  [crm]: Moose Cree
  [csw]: Swampy Cree
  [ctp]: Western Highland Chatino
  [dan]: Danish
> [eng]: English
  [fin]: Finnish
  [fra]: French
  [git]: Gitksan
  [gla]: Scottish Gaelic
  [gwi]: Gwich'in
  [haa]: Hän
  [ikt]: Western Inuktut
  [iku]: Inuktitut Syllabics
  [iku-sro]: Inuktitut Romanized
  [kkz]: Kaska
  [kwk-boas]: Kwak'wala (Boas orthography)
  [kwk-napa]: Kwak'wala (NAPA orthography)
  [kwk-umista]: Kwak'wala (U'mista orthography)
  [lml]: Raga
  [mic]: Mi'kmaq
  [moe]: Innu-aimun
  [moh]: Kanien'kéha
  [oji]: Ojibwe
  [oji-syl]: Ojibwe Syllabics
  [oka]: nsyilxcən
  [see]: Seneca
  [srs]: Tsuut'ina
  [str]: SENĆOŦEN
  [tau]: Upper Tanana
  [tce]: Southern Tutchone
  [tgx]: Tagish
  [tli]: Tlingit
  [ttm]: Northern Tutchone
  [und]: Undetermined
(Press "/" to search)
╭──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
│ Which of the following text transformations would like to apply before determining the symbol set?                                                                                  │
╰──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
Applying lowercase to data:   0%|                                                                                                                 | 1/13100 [00:00<00:00, 13573.80it/s]
╭─────────────────────────────── Traceback (most recent call last) ─────────────────────────────────
│ /gpfs/fs3c/nrc/uat/nrc_dt/tes001/EveryVoice/everyvoice/cli.py:93 in new_dataset                  │
│                                                                                                  │
│    90 def new_dataset():                                                                         │
│    91 │   from everyvoice.wizard.main_tour import WIZARD_TOUR                                    │
│    92 │                                                                                          │
│ ❱  93 │   WIZARD_TOUR.run()                                                                      │
│    94                                                                                            │
│    95                                                                                            │
│    96 # Add preprocess to root                                                                   │
│                                                                                                  │
│ /gpfs/fs3c/nrc/uat/nrc_dt/tes001/EveryVoice/everyvoice/wizard/__init__.py:143 in run             │
│                                                                                                  │
│   140 │                                                                                          │
│   141 │   def run(self):                                                                         │
│   142 │   │   for _, _, node in RenderTree(self.root):                                           │
│ ❱ 143 │   │   │   node.run()                                                                     │
│   144 │                                                                                          │
│   145 │   def visualize(self):                                                                   │
│   146 │   │   for pre, _, node in RenderTree(self.root):                                         │
│                                                                                                  │
│ /gpfs/fs3c/nrc/uat/nrc_dt/tes001/EveryVoice/everyvoice/wizard/__init__.py:107 in run             │
│                                                                                                  │
│   104 │   │   │   │   │   self.state[self.name] = self.response                                  │
│   105 │   │   │   │   return self.response                                                       │
│   106 │   │   │   finally:                                                                       │
│ ❱ 107 │   │   │   │   self.effect()                                                              │
│   108 │   │   else:                                                                              │
│   109 │   │   │   self.run()                                                                     │
│   110                                                                                            │
│                                                                                                  │
│ /gpfs/fs3c/nrc/uat/nrc_dt/tes001/EveryVoice/everyvoice/wizard/dataset.py:369 in effect           │
│                                                                                                  │
│   366 │   │   │   │   │   range(len(self.state["filelist_data"])),                               │
│   367 │   │   │   │   │   desc=f"Applying {process_lookup[process]['desc']} to data",            │
│   368 │   │   │   │   ):                                                                         │
│ ❱ 369 │   │   │   │   │   self.state["filelist_data"][i]["text"] = process_lookup[process][      │
│   370 │   │   │   │   │   │   "fn"                                                               │
│   371 │   │   │   │   │   ](self.state["filelist_data"][i]["text"])                              │
│   372                                                                                            │
│                                                                                                  │
│ /gpfs/fs3c/nrc/uat/nrc_dt/tes001/EveryVoice/everyvoice/wizard/dataset.py:360 in <lambda>         │
│                                                                                                  │
│   357 │                                                                                          │
│   358 │   def effect(self):                                                                      │
│   359 │   │   process_lookup = {                                                                 │
│ ❱ 360 │   │   │   0: {"fn": lambda x: x.lower(), "desc": "lowercase"},                           │
│   361 │   │   │   1: {"fn": lambda x: normalize("NFC", x), "desc": ""},                          │
│   362 │   │   }                                                                                  │
│   363 │   │   if self.response is not None and len(self.response):                               │
╰──────────────────────────────────────────────────────────────────────────────────────────────────╯
AttributeError: 'NoneType' object has no attribute 'lower'

EveryVoiceTTS / EveryVoice

Selecting wrong filelist format causes uncaught KeyError #8