[BUG] TypeError: unsupported operand type(s) for +=: 'int' and 'NoneType'

MarcosValleMinon commented 7 months ago

https://github.com/MontrealCorpusTools/Montreal-Forced-Aligner/assets/109286045/c05abcaf-d4fa-437b-afaa-2c6f6942f6e6

Debugging checklist

[ ] Have you updated to latest MFA version? I have tried versions 3.0.0a8 and 2.7 and also the default Docker version [X] Have you tried rerunning the command with the --clean flag?

Describe the issue A clear and concise description of what the bug is. -> Alignment can't be performed due to an unexpected error. I'm following the first steps guide of MFA, use case 1. I have tried several audios, sometimes aligner works fine but often it can't finish because of this TypeError. I have tried to use the command "mfa validate" and sometimes works, but it appears to have no relation with the TypeError I get in the aligner command execution. I am working with acapella (only voice) song files. I share one file audio that produces this error.

For Reproducing your issue Please fill out the following:

Corpus structure
- What language is the corpus in? -> Spanish
- How many files/speakers? -> 1
- Are you using lab files or TextGrid files for input? -> Lab files
Dictionary
- Are you using a dictionary from MFA? If so, which one? -> spanish_mfa
- If it's a custom dictionary, what is the phoneset?
Acoustic model
- If you're using an acoustic model, is it one download through MFA? If so, which one? -> spanish_mfa
- If it's a model you've trained, what data was it trained on?

Log file Please attach the log file for the run that encountered an error (by default these will be stored in ~/Documents/MFA). -> I show the log from MFA 3.0.0a8 version for the bacha.wav file Command: mfa align ~/mfa_data/my_corpus spanish_mfa spanish_mfa ~/mfa_data/my_corpus_aligned

2024-02-26 17:37:28,382 - mfa - DEBUG - Beginning run for bacha 2024-02-26 17:37:28,382 - mfa - DEBUG - Using "global" profile 2024-02-26 17:37:28,382 - mfa - DEBUG - Using multiprocessing with 3 2024-02-26 17:37:28,382 - mfa - DEBUG - Set up logger for MFA version: 3.0.0a8 2024-02-26 17:37:28,864 - mfa - DEBUG - There were some differences in the current run compared to the last one. This may cause issues, run with --clean, if you hit an error. 2024-02-26 17:37:28,927 - mfa - DEBUG - Using IPA 2024-02-26 17:37:37,187 - mfa - DEBUG - Loaded dictionary in 8.260 seconds 2024-02-26 17:37:37,202 - mfa - INFO - Setting up corpus information... 2024-02-26 17:37:37,202 - mfa - DEBUG - Could not load from temp 2024-02-26 17:37:37,202 - mfa - INFO - Loading corpus from source files... 2024-02-26 17:37:38,250 - mfa - DEBUG - Processing queue: 0.03125 2024-02-26 17:37:38,303 - mfa - DEBUG - Parsed corpus directory with 3 jobs in 0.046875 seconds 2024-02-26 17:37:38,303 - mfa - INFO - Found 1 speaker across 1 file, average number of utterances per speaker: 1.0 2024-02-26 17:37:38,303 - mfa - DEBUG - Loaded corpus in 1.116 seconds 2024-02-26 17:37:38,319 - mfa - INFO - Initializing multiprocessing jobs... 2024-02-26 17:37:38,319 - mfa - WARNING - Number of jobs was specified as 3, but due to only having 1 speakers, MFA will only use 1 jobs. Use the --single_speaker flag if you would like to split utterances across jobs regardless of their speaker. 2024-02-26 17:37:38,365 - mfa - DEBUG - Initialized jobs in 0.062 seconds 2024-02-26 17:37:38,618 - mfa - INFO - Normalizing text... 2024-02-26 17:37:47,458 - mfa - DEBUG - Wrote lexicon information in 0.047 seconds 2024-02-26 17:37:47,458 - mfa - INFO - Generating MFCCs... 2024-02-26 17:37:50,674 - mfa - DEBUG - Generating MFCCs took 3.215 seconds 2024-02-26 17:37:50,676 - mfa - INFO - Calculating CMVN... 2024-02-26 17:37:50,710 - mfa - INFO - Generating final features... 2024-02-26 17:37:52,782 - mfa - DEBUG - Generating final features took 2.065 seconds 2024-02-26 17:37:52,782 - mfa - INFO - Creating corpus split... 2024-02-26 17:37:54,847 - mfa - DEBUG - Generated features in 7.388 seconds 2024-02-26 17:37:54,847 - mfa - DEBUG - Setting up corpus took 25.920 seconds 2024-02-26 17:37:54,847 - mfa - DEBUG - 2024-02-26 17:37:54,847 - mfa - DEBUG - ====ACOUSTIC MODEL INFO==== 2024-02-26 17:37:54,847 - mfa - DEBUG - Acoustic model root directory: C:\Users\MarcosValleMinon\Documents\MFA\extracted_models\acoustic 2024-02-26 17:37:54,847 - mfa - DEBUG - Acoustic model dirname: C:\Users\MarcosValleMinon\Documents\MFA\extracted_models\acoustic\spanish_mfa_acoustic 2024-02-26 17:37:54,847 - mfa - DEBUG - Acoustic model meta path: C:\Users\MarcosValleMinon\Documents\MFA\extracted_models\acoustic\spanish_mfa_acoustic\meta.json 2024-02-26 17:37:54,847 - mfa - DEBUG - Acoustic model meta information: 2024-02-26 17:37:54,847 - mfa - DEBUG - architecture: gmm-hmm dictionaries: bracketed_word: '[bracketed]' clitic_marker: '''' default: spanish_mfa laughter_word: '[laughter]' names:

spanish_latin_america_mfa
spanish_mfa
spanish_spain_mfa oov_word: position_dependent_phones: true silence_word: use_g2p: false features: allow_downsample: true allow_upsample: true delta_pitch: 0.005 feature_type: mfcc frame_length: 25 frame_shift: 10 high_frequency: 7800 low_frequency: 20 max_f0: 500 min_f0: 50 penalty_factor: 0.1 sample_frequency: 16000 snip_edges: true use_delta_pitch: true use_energy: false use_pitch: true use_voicing: true uses_cmvn: true uses_deltas: false uses_speaker_adaptation: true uses_splices: true uses_voiced: false final_non_silence_correction: 0.043333333333333335 final_silence_correction: 2.1133333333333333 initial_silence_probability: 0.20666666666666667 language: unknown oov_phone: spn optional_silence_phone: sil phone_set_type: IPA phone_type: triphone phones: !!set a: null b: null c: null "d\u032A": null e: null f: null i: null j: null k: null l: null m: null n: null o: null p: null r: null s: null "t\u0283": null "t\u032A": null u: null w: null x: null "\xE7": null "\xF0": null "\u014B": null "\u025F": null "\u025F\u029D": null "\u0261": null "\u0263": null "\u0272": null "\u027E": null "\u0283": null "\u028E": null "\u029D": null "\u03B2": null "\u03B8": null silence_probability: 0.20983452672314942 train_date: '2022-05-28 03:55:04.285212' training: audio_duration: 6298548.656292674 average_log_likelihood: -0.008354899461340902 num_oovs: 0 num_speakers: 22629 num_utterances: 784095 version: 2.0.0rc4.dev19+ged818cb.d20220404

2024-02-26 17:37:54,847 - mfa - DEBUG - 2024-02-26 17:37:54,847 - mfa - DEBUG - Setup for alignment in 25.983 seconds 2024-02-26 17:37:54,924 - mfa - INFO - Compiling training graphs... 2024-02-26 17:37:56,947 - mfa - DEBUG - Compiling training graphs took 2.023 seconds 2024-02-26 17:37:56,947 - mfa - INFO - Performing first-pass alignment... 2024-02-26 17:37:56,947 - mfa - INFO - Generating alignments... 2024-02-26 17:37:57,248 - mfa - ERROR - There was an error in the run, please see the log.

Desktop (please complete the following information):

OS: [e.g. Windows, OSX, Linux] -> Windows
Version [e.g. MacOSX 10.15, Ubuntu 20.04, Windows 10, etc] -> Windows 10 Enterprise
Any other details about the setup (Cloud, Docker, etc) -> Conda environment with Python 3.10.13

Additional context I have tried other version of MFA (version 2.7), installing MFA using Docker and also in antoher computer. However, this error always appeared.

Thanks a lot for your help!

mmcauliffe commented 7 months ago

You can try rerunning with --beam 100 in the command and it might produce an alignment. The pretrained models were not trained on singing data or with music heavily in the background, so the alignments are not likely to be very good, I really don't know. See https://montreal-forced-aligner.readthedocs.io/en/latest/user_guide/troubleshooting.html#errors-aligning-single-files for more information.

MarcosValleMinon commented 7 months ago

It works!! Thank you

MontrealCorpusTools / Montreal-Forced-Aligner

[BUG] TypeError: unsupported operand type(s) for +=: 'int' and 'NoneType' #760