MontrealCorpusTools / Montreal-Forced-Aligner

Command line utility for forced alignment using Kaldi
https://montrealcorpustools.github.io/Montreal-Forced-Aligner/
MIT License
1.32k stars 244 forks source link

File "aligner\corpus.py", line 500, in speaker_utterance_info ZeroDivisionError: division by zero #101

Open RealNicolasBourbaki opened 5 years ago

RealNicolasBourbaki commented 5 years ago

Hi everyone,

So my corpus are bunch of .textgrid files and .wav files. Sampling rates are correct (44k), but when I run this on Windows 10: F:\montreal-forced-aligner>bin\mfa_align F:\hiwi\12november\test F:\montreal-forced-aligner\dictionaries\german.txt german_prosodylab F:\hiwi\12november\result

I got an exception: ZeroDivisionError

how?

forwiat commented 5 years ago

Hi everyone,

So my corpus are bunch of .textgrid files and .wav files. Sampling rates are correct (44k), but when I run this on Windows 10: F:\montreal-forced-aligner>bin\mfa_align F:\hiwi\12november\test F:\montreal-forced-aligner\dictionaries\german.txt german_prosodylab F:\hiwi\12november\result

I got an exception: ZeroDivisionError

how?

did you solve this problem?

forwiat commented 5 years ago

Hi everyone,

So my corpus are bunch of .textgrid files and .wav files. Sampling rates are correct (44k), but when I run this on Windows 10: F:\montreal-forced-aligner>bin\mfa_align F:\hiwi\12november\test F:\montreal-forced-aligner\dictionaries\german.txt german_prosodylab F:\hiwi\12november\result

I got an exception: ZeroDivisionError

how?

I got the same problem,but now I solved it .Please check your filename ,for example, replace 'xxx.wav xxx' to 'xxx xxx'

MalcolmSlaney commented 5 years ago

I think this comes about because MFA didn't find any files to process (thus zero speakers). In my case it was caused by having .wav files with floating point data, but I believe anything that causes MFA to not find the audio and text data will cause this.

mmcauliffe commented 5 years ago

Yes, at the moment there's some audio preprocessing/inspection that's done in Python, which doesn't support as many formats as I would like, so I'd like to move over to sox for this kind of thing. In the meantime you might be able to resample/resave using sox or Praat into a WAV format known to be supported (i.e., 16kHz, 16-bit).

RealNicolasBourbaki commented 5 years ago

Thank you guys, that are all very helpful! I resampled the files with sox as Michael @mmcauliffe said and the problem has gone.

MckinstryJ commented 3 years ago

Hey all, I'm stuck! Here is the code I used to resample the audio to be of 1 channel and 16kHz.

import os
import wave

import audioop

def downsampleWav(src, dst, inrate=44100, outrate=16000, inchannels=2, outchannels=1):
    if not os.path.exists(src):
        print('Source not found!')
        return False

    if not os.path.exists(os.path.dirname(dst)):
        os.makedirs(os.path.dirname(dst))

    try:
        s_read = wave.open(src, 'r')
        s_write = wave.open(dst, 'w')
    except:
        print('Failed to open files!')
        return False

    n_frames = s_read.getnframes()
    data = s_read.readframes(n_frames)

    try:
        converted = audioop.ratecv(data, 2, inchannels, inrate, outrate, None)
        if outchannels == 1:
            converted = audioop.tomono(converted[0], 2, 1, 0)
    except:
        print('Failed to downsample wav')
        return False

    try:
        s_write.setparams((outchannels, 2, outrate, 0, 'NONE', 'Uncompressed'))
        s_write.writeframes(converted)
    except:
        print('Failed to write wav')
        return False

    try:
        s_read.close()
        s_write.close()
    except:
        print('Failed to close wav files')
        return False

    return True

in_path = 'C:/Users/McKinstryJohn/Desktop/.../Resampled/wavs/'
out_path = 'C:/Users/McKinstryJohn/Desktop/.../Resampled/resampled/'

sr = 0

for file in os.listdir(in_path):
    if file.endswith('.wav'):
        with wave.open(os.path.join(in_path + file), 'rb') as wav_file:
            sr = wav_file.getframerate()

        downsampleWav(os.path.join(in_path + file), os.path.join(out_path + file),
                          inrate=sr, outrate=16000)

However, when I run (on Windows 10)...

mfa align ./resampled ./librispeech-lexicon.txt english ./output'

...I get the ZeroDivisonError. This is on new audio files but what could be the issue here?

mmcauliffe commented 3 years ago

Hard to say without seeing some logs. Maybe try doing mfa validate ./resampled ./librispeech-lexicon.txt and see what the output says about the corpus (also see: https://montreal-forced-aligner.readthedocs.io/en/latest/data_validation.html). If you think it's due to resampling, you could try using sox instead to generate the correct format, that's usually my go-to for these kinds of issues.

MckinstryJ commented 3 years ago

Thanks for the quick reply, however, I'm still getting the same error after processing the audio with sox. To give you more info, this is the MFA command:

mfa align ./resampled ./librispeech-lexicon.txt english ./output

This is the error log:

Setting up corpus information...
WARNING: Some issues parsing the corpus were detected. Please run the validator to get more information.
Traceback (most recent call last):
  File "c:\...\aligner\lib\runpy.py", line 194, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "c:\...\aligner\lib\runpy.py", line 87, in _run_code
    exec(code, run_globals)
  File "C:\...\aligner\Scripts\mfa.exe\__main__.py", line 7, in <module>
  File "c:\...\aligner\lib\site-packages\montreal_forced_aligner\command_line\mfa.py", line 290, in main
    run_align_corpus(args, acoustic_languages)
  File "c:\...\aligner\lib\site-packages\montreal_forced_aligner\command_line\align.py", line 147, in run_align_corpus
    align_corpus(args)
  File "c:\...\aligner\lib\site-packages\montreal_forced_aligner\command_line\align.py", line 71, in align_corpus
    print(corpus.speaker_utterance_info())
  File "c:\...\aligner\lib\site-packages\montreal_forced_aligner\corpus\base.py", line 307, in speaker_utterance_info
    average_utterances = sum(len(x) for x in self.speak_utt_mapping.values()) / num_speakers
ZeroDivisionError: division by zero 

Lastly, the audio files were preprocessed like so:

import glob
import subprocess

in_path = 'C:/.../wavs/'
out_path = 'C:/.../resampled/'

sr = 16000

for file in glob.glob(in_path + '*.wav'):
    output_file = out_path + file[len(in_path):]
    subprocess.call(f"sox \"{file}\" -c 1 -r {sr} \"{output_file}\"", shell=True)

Sox did give a warning with almost every file, which looked like this:

sox WARN rate: rate clipped 8 samples; decrease volume?
sox WARN dither: dither clipped 7 samples; decrease volume?

Currently looking into that part but any advice would be greatly appreciated! I did try the validate command but the message isn't that informative. On a deeper look, mfa isn't populating the 'self.utt_wav_files' variable which gives the following error

CorpusError('There were no wav files found for transcribing this corpus. Please validate the corpus.')
montreal_forced_aligner.exceptions.CorpusError: There were no wav files found for transcribing this corpus. Please validate the corpus.
mmcauliffe commented 3 years ago

Does the resampled directory contain the transcript files as well? If there's no .lab/.txt/.TextGrid files in the same directory with same name .wav files, it can't perform alignment on them.

MckinstryJ commented 3 years ago

Thanks for you help! What eventually worked for me was resampling the audio to be of 1 channel and 16kHz using SoX then created .lab files using prosodylab.alignertools. Once I had the lab files and there were no spaces in the file names 'mfa train' worked.

mmcauliffe commented 3 years ago

Oh, some of the file names had spaces? That might have caused issues since kaldi uses spaces as a delimiter for their internal files, I'll do some testing with that, see if there's an easy fix to support those

asarsembayev commented 7 months ago

I've just wasted half an hour on this problem (( The mfa train command was throwing CorpusError('There were no wav files found for transcribing this corpus. Please validate the corpus.') again and again. I couldn't figure out the real problem, but probably the first time, I provided the train command with the wrong corpus directory. Then, it's been throwing this exception again and again, until I deleted the acoustic model directory, in my case it's ~/Documents/MFA/${model_name}