ErikEkstedt / TTD

TurnTaking Datasets
1 stars 2 forks source link

Issues with vad #1

Open atyshka opened 2 years ago

atyshka commented 2 years ago

Hi Erik, nice work on this paper and thanks for open-sourcing! I'm trying to train TurnGPT and running into the following error:

Traceback (most recent call last):
  File "turngpt/main.py", line 188, in <module>
    main(args)
  File "turngpt/main.py", line 74, in main
    dm.prepare_data()
  File "/home/alex/.local/lib/python3.8/site-packages/pytorch_lightning/core/datamodule.py", line 90, in wrapped_fn
    return fn(*args, **kwargs)
  File "/home/alex/.local/lib/python3.8/site-packages/pytorch_lightning/utilities/distributed.py", line 35, in wrapped_fn
    return fn(*args, **kwargs)
  File "/home/alex/turngpt/TurnGPT/turngpt/turngpt_dm.py", line 84, in prepare_data
    tok_path = builder.prepare_explicit_turn_level_tokens(
  File "/home/alex/turngpt/TTD/ttd/basebuilder.py", line 434, in prepare_explicit_turn_level_tokens
    self.prepare_turn_level_tokens(tokenizer)  # check the necessary data exists
  File "/home/alex/turngpt/TTD/ttd/basebuilder.py", line 316, in prepare_turn_level_tokens
    self.prepare_turn_level()
  File "/home/alex/turngpt/TTD/ttd/basebuilder.py", line 302, in prepare_turn_level
    self._process_turn_level()
  File "/home/alex/turngpt/TTD/ttd/datasets/maptask.py", line 117, in _process_turn_level
    self.prepare_vad()  # processed vad values required
  File "/home/alex/turngpt/TTD/ttd/basebuilder.py", line 269, in prepare_vad
    vad = vad_from_word_level(word_level_dialog, audio_path)
  File "/home/alex/turngpt/TTD/ttd/vad_helpers.py", line 13, in vad_from_word_level
    start = dw["start"] / duration
TypeError: unsupported operand type(s) for /: 'float' and 'str'

Looking into the code, it seems vad_from_word_level expects dialog words and duration as arguments, but is receiving an audio_path instead of duration. I'm guessing this is a simple mistake from code that was changed in one file but not the other, but it's difficult to infer how it should be working correctly. Happy to PR if you can point me to the fix

msobrevillac commented 2 years ago

Hi Erik, similarly to @atyshka, I have the same problem. I realised that the problem happens when I run the training on switchboard or maptask. @atyshka , did you get to fix it?

atyshka commented 2 years ago

I just trained without those datasets

ErikEkstedt commented 2 years ago

Hello! very sorry for not answering any of these issues before!

This Codebase is outdated (and surprise surprise not maintained) if you want to train a TurnGPT model please checkout the "simplified" branch in the TurnGPT repo for an updated version.

Switchboard/Maptask requires manual download (and license) and only works if downloaded correctly. The "simplified" TurnGPT branch does not include these datasets for that reason. However, the training is vastly simplified.