MahmoudAshraf97 / ctc-forced-aligner

Text to speech alignment using CTC forced alignment
146 stars 30 forks source link

TypeError: '>=' not supported between instances of 'NoneType' and 'int' using custom wav2vec2 #18

Closed C00reNUT closed 1 month ago

C00reNUT commented 2 months ago

Hello,

Thank you for marking this public!

Everything works very nicely with the default model, but I wanted to compare the results using wav2vec2 model finetuned on my own, czech language using

ctc-forced-aligner --audio_path "Moudry_zlatnik.wav" --text_path "Moudry_zlatnik.txt" --language "ces" --romanize --alignment_model badrex/xlsr-czech

the model is from repo https://huggingface.co/badrex/xlsr-czech

When I run the cli command i get:

/mnt/a0b764eb-cdc5-4f46-9a2e-e2f11deba631/PYTHON_CACHE/ctc-forced-aligner/lib/python3.11/site-packages/transformers/utils/hub.py:127: FutureWarning: Using `TRANSFORMERS_CACHE` is deprecated and will be removed in v5 of Transformers. Use `HF_HOME` instead.
  warnings.warn(
Traceback (most recent call last):
  File "/mnt/a0b764eb-cdc5-4f46-9a2e-e2f11deba631/PYTHON_CACHE/ctc-forced-aligner/bin/ctc-forced-aligner", line 8, in <module>
    sys.exit(cli())
             ^^^^^
  File "/mnt/a0b764eb-cdc5-4f46-9a2e-e2f11deba631/PYTHON_CACHE/ctc-forced-aligner/lib/python3.11/site-packages/ctc_forced_aligner/align.py", line 151, in cli
    segments, scores, blank_id = get_alignments(
                                 ^^^^^^^^^^^^^^^
  File "/mnt/a0b764eb-cdc5-4f46-9a2e-e2f11deba631/PYTHON_CACHE/ctc-forced-aligner/lib/python3.11/site-packages/ctc_forced_aligner/alignment_utils.py", line 231, in get_alignments
    path, scores = forced_align(
                   ^^^^^^^^^^^^^
  File "/mnt/a0b764eb-cdc5-4f46-9a2e-e2f11deba631/PYTHON_CACHE/ctc-forced-aligner/lib/python3.11/site-packages/ctc_forced_aligner/alignment_utils.py", line 199, in forced_align
    if blank >= log_probs.shape[-1] or blank < 0:
       ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
TypeError: '>=' not supported between instances of 'NoneType' and 'int'

Any ideas how I could fix this?

MahmoudAshraf97 commented 2 months ago

should be fixed in the last commit

C00reNUT commented 2 months ago

Sadly, even after latest pull, the issue still persist.

Moudrý zlatník
Na motivy bohátky Boženy Němcové napsala Pavla Plocková.
Dramaturg Eva Košlerová, hudbu složil Tomáš Vránek, řídí Jiří Váchal.
Radoš Pavel Kříž, Liběna Zlata Adamovská, stařec Eduard Cupák, dívka Hanna Maciuchová, král, zasloužilý umělec Luděk Monzar, sluha Antonín Hart, starý zlatník, zasloužilý umělec Ota Sklenčka, hlas Zdeněk Martínek.
Asistentka režie Květa Straková, záznam a střih Ana Suchánková, zvuk Mirka Rychlá, nastudoval režisér Karel Vajnlich.
přestaň s tím povypováním.
Zpívám si jen tak potichu.
Rušíš můj odpočínek.
Můžeš si najít místo pod jiným stromem.
A mě se líbí tady, pod javorem.
Jenže já to byla dřív.
Stejně ti nezbude, než pustit mě na své místo.
Tohle říkáš celé věky, co se potkáváme a přece jsem ti ještě nikdy neuhnula z cesty.
Tak ty jednou vstoupíš.
Nevím proč.
Protože rozum, děvče, rozum nakonec musí mít vrt.
Domýšlivost ti zakrývá oči?
Jinak bys viděl, že mne mají lidé raději než tebe.
Snad ne za ty tvé popěvky.
Za radost a za útichu.
Někdy za naději.
Nebuď pošetivá.

Could you please try this text with some audio wav file and the badrex/xlsr-czech alignment model to test it?

MahmoudAshraf97 commented 2 months ago

I just tested it again, I couldn't reproduce the same error, please note that the usage instructions have ben slightly changed so refer to the readme

C00reNUT commented 2 months ago

Yes, but cli instructions didn't change just code, I still run the same cli command

ctc-forced-aligner --audio_path audio.wav --text_path text.txt --language "ces" --alignment_model badrex/xlsr-czech

and still get

warnings.warn(
Traceback (most recent call last):
  File "/mnt/a0b764eb-cdc5-4f46-9a2e-e2f11deba631/PYTHON_CACHE/ctc-forced-aligner/bin/ctc-forced-aligner", line 8, in <module>
    sys.exit(cli())
             ^^^^^
  File "/mnt/a0b764eb-cdc5-4f46-9a2e-e2f11deba631/PYTHON_CACHE/ctc-forced-aligner/lib/python3.11/site-packages/ctc_forced_aligner/align.py", line 151, in cli
    segments, scores, blank_id = get_alignments(
                                 ^^^^^^^^^^^^^^^
  File "/mnt/a0b764eb-cdc5-4f46-9a2e-e2f11deba631/PYTHON_CACHE/ctc-forced-aligner/lib/python3.11/site-packages/ctc_forced_aligner/alignment_utils.py", line 231, in get_alignments
    path, scores = forced_align(
                   ^^^^^^^^^^^^^
  File "/mnt/a0b764eb-cdc5-4f46-9a2e-e2f11deba631/PYTHON_CACHE/ctc-forced-aligner/lib/python3.11/site-packages/ctc_forced_aligner/alignment_utils.py", line 199, in forced_align
    if blank >= log_probs.shape[-1] or blank < 0:
       ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
TypeError: '>=' not supported between instances of 'NoneType' and 'int'

I have tried multiple wav2vec2 czech models, but I get the same error for all of them: https://huggingface.co/badrex/xlsr-czech https://huggingface.co/arampacha/wav2vec2-large-xlsr-czech https://huggingface.co/comodoro/wav2vec2-xls-r-300m-cs-250

C00reNUT commented 2 months ago

Strange, probably I am doing something wrong... because when I run your default example command

ctc-forced-aligner --audio_path "audio.wav" --text_path "text.txt" --language "ara" --alignment_model "jonatasgrosman/wav2vec2-large-xlsr-53-arabic"

using the same text and audio that runs just fine using default model (but I get the error mentioned in the previous comment) I get

File "/mnt/a0b764eb-cdc5-4f46-9a2e-e2f11deba631/PYTHON_CACHE/ctc-forced-aligner/bin/ctc-forced-aligner", line 8, in <module>
    sys.exit(cli())
             ^^^^^
  File "/mnt/a0b764eb-cdc5-4f46-9a2e-e2f11deba631/PYTHON_CACHE/ctc-forced-aligner/lib/python3.11/site-packages/ctc_forced_aligner/align.py", line 157, in cli
    spans = get_spans(tokens_starred, segments, tokenizer.decode(blank_id))
            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/mnt/a0b764eb-cdc5-4f46-9a2e-e2f11deba631/PYTHON_CACHE/ctc-forced-aligner/lib/python3.11/site-packages/ctc_forced_aligner/alignment_utils.py", line 63, in get_spans
    assert seg.label == ltr, f"{seg.label} != {ltr}"
           ^^^^^^^^^^^^^^^^
AssertionError: <star> != u
MahmoudAshraf97 commented 1 month ago

should be fixed now, btw, if the custom model supports the whole vocabulary of the language you should not romanize, only romanize if you are using the default model or using English model for a language other than English