hypo.word file missing during MMS ASR inference

❓ Questions and Help

What is your question?

I'm facing the following issue while running the MMS ASR inference script examples/mms/asr/infer/mms_infer.py:

  File "/workspace/fairseq/examples/mms/asr/infer/mms_infer.py", line 52, in <module>
    process(args)
  File "/workspace/fairseq/examples/mms/asr/infer/mms_infer.py", line 44, in process
    with open(tmpdir/"hypo.word") as fr:
FileNotFoundError: [Errno 2] No such file or directory: '/workspace/tmpsjatjyxt/hypo.word'

Code

python examples/mms/asr/infer/mms_infer.py --model "/workspace/fairseq/mms1b_fl102.pt" --lang "urd-script_arabic" --audio "/workspace/audio.wav"

What have you tried?

Tried running the ASR on different audios and languages

What's your environment?

fairseq Version (e.g., 1.0 or main): main
PyTorch Version (e.g., 1.0): 2.0.0
OS (e.g., Linux): Linux
How you installed fairseq (pip, source): pip
Build command you used (if compiling from source): N/A
Python version: 3.10.10
CUDA/cuDNN version: 11.6
GPU models and configuration: NVIDIA A6000
Any other relevant information: N/A

I get this error，It is very strange why the folder should be created in the root directory, because I am a mandarin user and not an administrator. Mandarin users cannot create files in the root directory.

Traceback (most recent call last):
  File "/home/ybZhang/miniconda3/envs/fairseq/lib/python3.8/pathlib.py", line 1284, in mkdir
    self._accessor.mkdir(self, mode)
FileNotFoundError: [Errno 2] No such file or directory: '/checkpoint/ybZhang/INFER/None'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/ybZhang/miniconda3/envs/fairseq/lib/python3.8/pathlib.py", line 1284, in mkdir
    self._accessor.mkdir(self, mode)
FileNotFoundError: [Errno 2] No such file or directory: '/checkpoint/ybZhang/INFER'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/ybZhang/miniconda3/envs/fairseq/lib/python3.8/pathlib.py", line 1284, in mkdir
    self._accessor.mkdir(self, mode)
FileNotFoundError: [Errno 2] No such file or directory: '/checkpoint/ybZhang'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "examples/speech_recognition/new/infer.py", line 499, in <module>
    cli_main()
  File "examples/speech_recognition/new/infer.py", line 495, in cli_main
    hydra_main()  # pylint: disable=no-value-for-parameter
  File "/home/ybZhang/miniconda3/envs/fairseq/lib/python3.8/site-packages/hydra/main.py", line 32, in decorated_main
    _run_hydra(
  File "/home/ybZhang/miniconda3/envs/fairseq/lib/python3.8/site-packages/hydra/_internal/utils.py", line 354, in _run_hydra
    run_and_report(
  File "/home/ybZhang/miniconda3/envs/fairseq/lib/python3.8/site-packages/hydra/_internal/utils.py", line 201, in run_and_report
    raise ex
  File "/home/ybZhang/miniconda3/envs/fairseq/lib/python3.8/site-packages/hydra/_internal/utils.py", line 198, in run_and_report
    return func()
  File "/home/ybZhang/miniconda3/envs/fairseq/lib/python3.8/site-packages/hydra/_internal/utils.py", line 355, in <lambda>
    lambda: hydra.multirun(
  File "/home/ybZhang/miniconda3/envs/fairseq/lib/python3.8/site-packages/hydra/_internal/hydra.py", line 136, in multirun
    return sweeper.sweep(arguments=task_overrides)
  File "/home/ybZhang/miniconda3/envs/fairseq/lib/python3.8/site-packages/hydra/_internal/core_plugins/basic_sweeper.py", line 140, in sweep
    sweep_dir.mkdir(parents=True, exist_ok=True)
  File "/home/ybZhang/miniconda3/envs/fairseq/lib/python3.8/pathlib.py", line 1288, in mkdir
    self.parent.mkdir(parents=True, exist_ok=True)
  File "/home/ybZhang/miniconda3/envs/fairseq/lib/python3.8/pathlib.py", line 1288, in mkdir
    self.parent.mkdir(parents=True, exist_ok=True)
  File "/home/ybZhang/miniconda3/envs/fairseq/lib/python3.8/pathlib.py", line 1288, in mkdir
    self.parent.mkdir(parents=True, exist_ok=True)
  File "/home/ybZhang/miniconda3/envs/fairseq/lib/python3.8/pathlib.py", line 1284, in mkdir
    self._accessor.mkdir(self, mode)
PermissionError: [Errno 13] Permission denied: '/checkpoint'
Traceback (most recent call last):
  File "examples/mms/asr/infer/mms_infer.py", line 62, in <module>
    process(args)
  File "examples/mms/asr/infer/mms_infer.py", line 52, in process
    with open(tmpdir/"hypo.word") as fr:
FileNotFoundError: [Errno 2] No such file or directory: '/tmp/tmpiud3trh1/hypo.word'

@v-yunbin do this first

https://github.com/facebookresearch/fairseq/issues/5117#issuecomment-1563486024

The hypo.word error is vague and it's in no way the source of your issues.

out = subprocess.run(cmd, check=True, shell=True, stdout=subprocess.DEVNULL, ) print(out)

system: google colab
sample url: https://colab.research.google.com/github/facebookresearch/fairseq/blob/main/examples/mms/asr/tutorial/MMS_ASR_Inference_Colab.ipynb
language: mandarin
code block:
```
import os
```

os.environ["TMPDIR"] = '/content/temp_dir' os.environ["PYTHONPATH"] = "." os.environ["PREFIX"] = "INFER" os.environ["HYDRA_FULL_ERROR"] = "1" os.environ["USER"] = "micro"

!python examples/mms/asr/infer/mms_infer.py --model "/content/fairseq/models_new/mms1b_fl102.pt" --lang "cmn" --audio "/content/fairseq/audio_samples/demo2.wav"


### error message

2023-05-26 06:48:12.609284: I tensorflow/core/platform/cpu_feature_guard.cc:182] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations. To enable the following instructions: AVX2 FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags. 2023-05-26 06:48:14.321529: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Could not find TensorRT Traceback (most recent call last): File "/content/fairseq/examples/speech_recognition/new/infer.py", line 499, in cli_main() File "/content/fairseq/examples/speech_recognition/new/infer.py", line 495, in cli_main hydra_main() # pylint: disable=no-value-for-parameter File "/usr/local/lib/python3.10/dist-packages/hydra/main.py", line 32, in decorated_main _run_hydra( File "/usr/local/lib/python3.10/dist-packages/hydra/_internal/utils.py", line 354, in _run_hydra run_and_report( File "/usr/local/lib/python3.10/dist-packages/hydra/_internal/utils.py", line 201, in run_and_report raise ex File "/usr/local/lib/python3.10/dist-packages/hydra/_internal/utils.py", line 198, in run_and_report return func() File "/usr/local/lib/python3.10/dist-packages/hydra/_internal/utils.py", line 355, in lambda: hydra.multirun( File "/usr/local/lib/python3.10/dist-packages/hydra/_internal/hydra.py", line 136, in multirun return sweeper.sweep(arguments=task_overrides) File "/usr/local/lib/python3.10/dist-packages/hydra/_internal/core_plugins/basic_sweeper.py", line 154, in sweep results = self.launcher.launch(batch, initial_job_idx=initial_job_idx) File "/usr/local/lib/python3.10/dist-packages/hydra/_internal/core_plugins/basic_launcher.py", line 76, in launch ret = run_job( File "/usr/local/lib/python3.10/dist-packages/hydra/core/utils.py", line 129, in run_job ret.return_value = task_function(task_cfg) File "/content/fairseq/examples/speech_recognition/new/infer.py", line 460, in hydra_main distributed_utils.call_main(cfg, main) File "/content/fairseq/fairseq/distributed/utils.py", line 404, in call_main main(cfg, **kwargs) File "/content/fairseq/examples/speech_recognition/new/infer.py", line 407, in main with InferenceProcessor(cfg) as processor: File "/content/fairseq/examples/speech_recognition/new/infer.py", line 110, in init assert target_lang in ckpt_obj["adapter"] AssertionError Traceback (most recent call last): File "/content/fairseq/examples/mms/asr/infer/mms_infer.py", line 65, in process(args) File "/content/fairseq/examples/mms/asr/infer/mms_infer.py", line 53, in process out = subprocess.run(cmd, check=True, shell=True, stdout=subprocess.DEVNULL, ) File "/usr/lib/python3.10/subprocess.py", line 526, in run raise CalledProcessError(retcode, process.args, subprocess.CalledProcessError: Command ' PYTHONPATH=. PREFIX=INFER HYDRA_FULL_ERROR=1 python examples/speech_recognition/new/infer.py -m --config-dir examples/mms/asr/config/ --config-name infer_common decoding.type=viterbi dataset.max_tokens=1440000 distributed_training.distributed_world_size=1 "common_eval.path='/content/fairseq/models_new/mms1b_fl102.pt'" task.data=/content/temp_dir/tmp_qc3tcvm dataset.gen_subset="cmn:dev" common_eval.post_process=letter decoding.results_path=/content/temp_dir/tmp_qc3tcvm ' returned non-zero exit status 1.

@v-yunbin do this first

https://github.com/facebookresearch/fairseq/issues/5117#issuecomment-1563486024

The hypo.word error is vague and it's in no way the source of your issues.

@ajahstudio it works，thank you

@v-yunbin you are welcome.

out = subprocess.run(cmd, check=True, shell=True, stdout=subprocess.DEVNULL, ) print(out)

system: google colab
sample url: https://colab.research.google.com/github/facebookresearch/fairseq/blob/main/examples/mms/asr/tutorial/MMS_ASR_Inference_Colab.ipynb
language: mandarin
code block:

import os

os.environ["TMPDIR"] = '/content/temp_dir'
os.environ["PYTHONPATH"] = "."
os.environ["PREFIX"] = "INFER"
os.environ["HYDRA_FULL_ERROR"] = "1"
os.environ["USER"] = "micro"

!python examples/mms/asr/infer/mms_infer.py --model "/content/fairseq/models_new/mms1b_fl102.pt" --lang "cmn" --audio "/content/fairseq/audio_samples/demo2.wav"

error message

2023-05-26 06:48:12.609284: I tensorflow/core/platform/cpu_feature_guard.cc:182] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: AVX2 FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.
2023-05-26 06:48:14.321529: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Could not find TensorRT
Traceback (most recent call last):
  File "/content/fairseq/examples/speech_recognition/new/infer.py", line 499, in <module>
    cli_main()
  File "/content/fairseq/examples/speech_recognition/new/infer.py", line 495, in cli_main
    hydra_main()  # pylint: disable=no-value-for-parameter
  File "/usr/local/lib/python3.10/dist-packages/hydra/main.py", line 32, in decorated_main
    _run_hydra(
  File "/usr/local/lib/python3.10/dist-packages/hydra/_internal/utils.py", line 354, in _run_hydra
    run_and_report(
  File "/usr/local/lib/python3.10/dist-packages/hydra/_internal/utils.py", line 201, in run_and_report
    raise ex
  File "/usr/local/lib/python3.10/dist-packages/hydra/_internal/utils.py", line 198, in run_and_report
    return func()
  File "/usr/local/lib/python3.10/dist-packages/hydra/_internal/utils.py", line 355, in <lambda>
    lambda: hydra.multirun(
  File "/usr/local/lib/python3.10/dist-packages/hydra/_internal/hydra.py", line 136, in multirun
    return sweeper.sweep(arguments=task_overrides)
  File "/usr/local/lib/python3.10/dist-packages/hydra/_internal/core_plugins/basic_sweeper.py", line 154, in sweep
    results = self.launcher.launch(batch, initial_job_idx=initial_job_idx)
  File "/usr/local/lib/python3.10/dist-packages/hydra/_internal/core_plugins/basic_launcher.py", line 76, in launch
    ret = run_job(
  File "/usr/local/lib/python3.10/dist-packages/hydra/core/utils.py", line 129, in run_job
    ret.return_value = task_function(task_cfg)
  File "/content/fairseq/examples/speech_recognition/new/infer.py", line 460, in hydra_main
    distributed_utils.call_main(cfg, main)
  File "/content/fairseq/fairseq/distributed/utils.py", line 404, in call_main
    main(cfg, **kwargs)
  File "/content/fairseq/examples/speech_recognition/new/infer.py", line 407, in main
    with InferenceProcessor(cfg) as processor:
  File "/content/fairseq/examples/speech_recognition/new/infer.py", line 110, in __init__
    assert target_lang in ckpt_obj["adapter"]
AssertionError
Traceback (most recent call last):
  File "/content/fairseq/examples/mms/asr/infer/mms_infer.py", line 65, in <module>
    process(args)
  File "/content/fairseq/examples/mms/asr/infer/mms_infer.py", line 53, in process
    out = subprocess.run(cmd, check=True, shell=True, stdout=subprocess.DEVNULL, )
  File "/usr/lib/python3.10/subprocess.py", line 526, in run
    raise CalledProcessError(retcode, process.args,
subprocess.CalledProcessError: Command '
        PYTHONPATH=. PREFIX=INFER HYDRA_FULL_ERROR=1 python examples/speech_recognition/new/infer.py -m --config-dir examples/mms/asr/config/ --config-name infer_common decoding.type=viterbi dataset.max_tokens=1440000 distributed_training.distributed_world_size=1 "common_eval.path='/content/fairseq/models_new/mms1b_fl102.pt'" task.data=/content/temp_dir/tmp_qc3tcvm dataset.gen_subset="cmn:dev" common_eval.post_process=letter decoding.results_path=/content/temp_dir/tmp_qc3tcvm 
        ' returned non-zero exit status 1.

Be sure to run pip install --editable ./ and !pip install tensorboardX

Also make sure your audio sample is in appropriate if you are not sure, run this on your audio file ffmpeg -i .\your_current_audio_file.mp3 -ar 16000 .\directory\to_output\your_converted_file.wav

i will try to recreate a working colab notebook with step by steps on error you might get.

I test chinese mandarin audio data, But the recognition result is relatively poor, I don't know why，my commad is： PYTHONPATH=. PREFIX=INFER HYDRA_FULL_ERROR=1 python examples/mms/asr/infer/mms_infer.py --model /data/disk1/ybZhang/fairseq/examples/mms/asr/models/MMS-1B-all/mms1b_all.pt --lang "cmn-script_simplified" --audio "/data/disk1/ybZhang/fairseq/test_part3.wav"

@v-yunbin try out the different models and see what works for you, Its an experimental process.

reach out to me on ajahstudio@gmail.com am looking to collaborate.

Be sure to run pip install --editable ./ and !pip install tensorboardX

Also make sure your audio sample is in appropriate if you are not sure, run this on your audio file ffmpeg -i .\your_current_audio_file.mp3 -ar 16000 .\directory\to_output\your_converted_file.wav

i will try to recreate a working colab notebook with step by steps on error you might get.

Hello, thank you for your response. I have indeed executed the code you mentioned.

!pip install --editable ./
!pip install tensorboardX
!ffmpeg -i .\your_current_audio_file.mp3 -ar 16000 .\directory\to_output\your_converted_file.wav

In the original sample code provided in Colab, the English MP3 files correctly produce the text content.
However, when I uploaded my own Mandarin WAV file, which was already in 16 kHz format, it still resulted in the error mentioned above.
As a precautionary measure, I tried using the ffmpeg command to convert the file, but the error mentioned earlier still persists.

reach out to me on ajahstudio@gmail.com am looking to collaborate.

I test chinese mandarin audio data with mms1b_all model , mms1b_all may be the best model?

Be sure to run pip install --editable ./ and !pip install tensorboardX Also make sure your audio sample is in appropriate if you are not sure, run this on your audio file ffmpeg -i .\your_current_audio_file.mp3 -ar 16000 .\directory\to_output\your_converted_file.wav i will try to recreate a working colab notebook with step by steps on error you might get.

Hello, thank you for your response. I have indeed executed the code you mentioned.
!pip install --editable ./
!pip install tensorboardX
!ffmpeg -i .\your_current_audio_file.mp3 -ar 16000 .\directory\to_output\your_converted_file.wav
In the original sample code provided in Colab, the English MP3 files correctly produce the text content.

However, when I uploaded my own Mandarin WAV file, which was already in 16 kHz format, it still resulted in the error mentioned above.

As a precautionary measure, I tried using the ffmpeg command to convert the file, but the error mentioned earlier still persists.

Try to use a short audio length to see if the error still occurs ( like saying a couple of words ).

reach out to me on ajahstudio@gmail.com am looking to collaborate.

I test chinese mandarin audio data with mms1b_all model , mms1b_all may be the best model?

Yes

Try to use a short audio length to see if the error still occurs ( like saying a couple of words ).

I tried using a 7-second WAV file, but the same error still occurs. 🥲

Try to use a short audio length to see if the error still occurs ( like saying a couple of words ).

I tried using a 7-second WAV file, but the same error still occurs. 🥲

I will update if i have a fix

Same error. Has anyone solved this problem? “cannot unpack non-iterable NoneType object”

$ python examples/mms/asr/infer/mms_infer.py --model /idiap/temp/esarkar/cache/fairseq/mms1b_all.pt --lang shp --audio /idiap/temp/esarkar/Data/shipibo/downsampled_single_folder/short/shp-ROS-2022-03-14-2.1.wav

>>> preparing tmp manifest dir ...
>>> loading model & running inference ...
Traceback (most recent call last):
  File "/remote/idiap.svm/user.active/esarkar/speech/fairseq/examples/speech_recognition/new/infer.py", line 21, in <module>
    from examples.speech_recognition.new.decoders.decoder_config import (
  File "/remote/idiap.svm/user.active/esarkar/speech/fairseq/examples/speech_recognition/__init__.py", line 1, in <module>
    from . import criterions, models, tasks  # noqa
  File "/remote/idiap.svm/user.active/esarkar/speech/fairseq/examples/speech_recognition/criterions/__init__.py", line 15, in <module>
    importlib.import_module(
  File "/idiap/temp/esarkar/miniconda/envs/fairseq/lib/python3.9/importlib/__init__.py", line 127, in import_module
    return _bootstrap._gcd_import(name[level:], package, level)
  File "/remote/idiap.svm/user.active/esarkar/speech/fairseq/examples/speech_recognition/criterions/cross_entropy_acc.py", line 13, in <module>
    from fairseq import utils
  File "/remote/idiap.svm/user.active/esarkar/speech/fairseq/fairseq/__init__.py", line 33, in <module>
    import fairseq.criterions  # noqa
  File "/remote/idiap.svm/user.active/esarkar/speech/fairseq/fairseq/criterions/__init__.py", line 18, in <module>
    (
TypeError: cannot unpack non-iterable NoneType object
Traceback (most recent call last):
  File "/remote/idiap.svm/user.active/esarkar/speech/fairseq/examples/mms/asr/infer/mms_infer.py", line 52, in <module>
    process(args)
  File "/remote/idiap.svm/user.active/esarkar/speech/fairseq/examples/mms/asr/infer/mms_infer.py", line 44, in process
    with open(tmpdir/"hypo.word") as fr:
FileNotFoundError: [Errno 2] No such file or directory: '/idiap/temp/esarkar/tmp/tmpnhi5rrui/hypo.word'

Guys if you are free google collab user, memory kills process. So use mms1b_fl102 model instead of mms1b_all

Ok for anyone who still have the FileNotFoundError: [Errno 2] No such file or directory for the hypo.word error and just want to test the inference:

Its really what the error says. :D While inference the programming accesses the tmp folder and need to write some file. Including the hypo.word. As the error says: in line 44 of the mms_infer.py the it trys to open and write the hypo.word. with open(tmpdir/"hypo.word") as fr: as you can see no righs are defined for the open method are defined. so just give python the right to write and read the file. with open(tmpdir/"hypo.word", "w+") as fr: this should be all.

you can see in the code

def process(args):    
    with tempfile.TemporaryDirectory() as tmpdir:
        print(">>> preparing tmp manifest dir ...", file=sys.stderr)
        tmpdir = Path("/home/divisio/projects/tmp/")
        with open(tmpdir / "dev.tsv", "w") as fw:
            fw.write("/\n")
            for audio in args.audio:
                nsample = sf.SoundFile(audio).frames
                fw.write(f"{audio}\t{nsample}\n")
        with open(tmpdir / "dev.uid", "w") as fw:
            fw.write(f"{audio}\n"*len(args.audio))
        with open(tmpdir / "dev.ltr", "w") as fw:
            fw.write("d u m m y | d u m m y\n"*len(args.audio))
        with open(tmpdir / "dev.wrd", "w") as fw:
            fw.write("dummy dummy\n"*len(args.audio))
        cmd = f"""
        PYTHONPATH=. PREFIX=INFER HYDRA_FULL_ERROR=1 python examples/speech_recognition/new/infer.py -m --config-dir examples/mms/asr/config/ --config-name infer_common decoding.type=viterbi dataset.max_tokens=4000000 distributed_training.distributed_world_size=1 "common_eval.path='{args.model}'" task.data={tmpdir} dataset.gen_subset="{args.lang}:dev" common_eval.post_process={args.format} decoding.results_path={tmpdir}
        """
        print(">>> loading model & running inference ...", file=sys.stderr)
        subprocess.run(cmd, shell=True, stdout=subprocess.DEVNULL,)
        with open(tmpdir/"hypo.word", "w+") as fr:
            for ii, hypo in enumerate(fr):
                hypo = re.sub("\(\S+\)$", "", hypo).strip()
                print(f'===============\nInput: {args.audio[ii]}\nOutput: {hypo}')

python should already created tthe files dev.tsv, dev.uid, dev.ltr and dev.wrd in the same tmp folder. If you want to check this, simply change the

tmpdir = Path(tmpdir) in to a static folder for instance in your user directory like tmpdir = Path("/home/myuser/path/to/my/project/test")

and you will see that those file will bne created. including the hypo.word if you did the changes like I discribed before.

now the the examples/speech_recognition/new/infer.py will be triggerd in line 40. and it might fail writing down the inference log file. like @v-yunbin discribed `FileNotFoundError: [Errno 2] No such file or directory: '/checkpoint/.../INFER/None'

and its again just a problem with permissions to write some files. next to the mms_infer.py file is a config folder including a infer_common.yaml and there is the property

hydra:
  run:
    dir: ${common_eval.results_path}/${dataset.gen_subset}
  sweep:
    dir: /checkpoint/${env:USER}/${env:PREFIX}/${common_eval.results_path}
    subdir: ${dataset.gen_subset}

so it trys to write in to the checkpoint folder on root level. If you can not do that. simply change tthis folder my some folder in your user folder: for instance

hydra:
  run:
    dir: ${common_eval.results_path}/${dataset.gen_subset}
  sweep:
    dir: /home/myuser/my/project/folder/tmp/${env:USER}/${env:PREFIX}/${common_eval.results_path}
    subdir: ${dataset.gen_subset}

so now the script will have access to those folders and whrite down the inferece log (infer.log) in to that folder which includes the result of the ASR.

I would say it isn't a catch all error, but rather that error handling from the subprocess call is not done, so if the call to run the inference fails for any reason, the hypo.word file will not have been created, and thus the open() call will fail and throw that error. So you have to dig backwards at the subprocess command to find out what happens. This just got open sourced so it makes sense there are some rough edges, contribute back to the repo!

Yeah, that's what I mean, if anything happens within the subprocess for any reason, folks are going to get the above mentioned error. Then they will likely google their way into this issue, which covers many of the possible ways it can fail. I was trying to be extra verbose for other folks to potentially help.

edit: @altryne my bad I thought by your message you were printing the command out itself, not the output of running the command. Your error does look like its failing because of the lack of :. Good news is its open source so you could change : to another character, or run it on windows subsytem linux, or run it in docker.

Thanks! You helped a lot, I eventually had to rewrite that whole block like so:
        import os
        os.environ["TMPDIR"] = str(tmpdir)
        os.environ["PYTHONPATH"] = "."
        os.environ["PREFIX"] = "INFER"
        os.environ["HYDRA_FULL_ERROR"] = "1"
        os.environ["USER"] = "micro"

        cmd = f"""python examples/speech_recognition/new/infer.py -m --config-dir examples/mms/asr/config/ --config-name infer_common decoding.type=viterbi dataset.max_tokens=4000000 distributed_training.distributed_world_size=1 "common_eval.path='{args.model}'" task.data={tmpdir} dataset.gen_subset="{args.lang}" common_eval.post_process={args.format} decoding.results_path={tmpdir}
"""
To even have the command execute and do something and not fail outright.

I'm pretty sure I made the same changes and I still get the unpack error. I change the ENV vars before the cmd string + copied your entire cmd string. Maybe I'm missing something in infer_common.yaml or with running with args? (Windows paths do be scuffed)

There is a problem with the mms1b_fl102.pt model, and the replacement model is mms1b_all.pt. I solved this problem.

Not sure what I missed but running this I ran into an error this error. Maybe its a quick permission issue? Apologies I don't work with Docker regularly.


>>> preparing tmp manifest dir ...
>>> loading model & running inference ...
Traceback (most recent call last):
 File "/usr/lib/python3.8/pathlib.py", line 1288, in mkdir
   self._accessor.mkdir(self, mode)
**FileNotFoundError: [Errno 2] No such file or directory: '/checkpoint/user/INFER/None'**

During handling of the above exception, another exception occurred:

I edited the script and it's now working for me, with an Ubuntu 22.04 image, tested with both CUDA 11.8 and 12.1. Note that I added permissions for /checkpoint/${USERNAME}.

Dockerfile.mms:

# Also works with CUDA 12.1:
#FROM nvidia/cuda:12.1.1-cudnn8-runtime-ubuntu22.04
FROM nvidia/cuda:11.8.0-cudnn8-runtime-ubuntu22.04

ENV DEBIAN_FRONTEND=noninteractive
WORKDIR /usr/src/app

RUN apt-get update \
    && apt-get install -y python-is-python3 git python3-pip sudo wget curl

RUN git clone https://github.com/facebookresearch/fairseq.git \
    && cd fairseq \
    && pip install pip -U \
    && pip install --no-cache-dir . \
    && pip install --no-cache-dir soundfile \
    && pip install --no-cache-dir torch \
    && pip install --no-cache-dir hydra-core \
    && pip install --no-cache-dir editdistance \
    && pip install --no-cache-dir soundfile \
    && pip install --no-cache-dir omegaconf \
    && pip install --no-cache-dir scikit-learn \
    && pip install --no-cache-dir tensorboardX \
    && python setup.py build_ext --inplace

ENV USERNAME=user
RUN echo "root:root" | chpasswd \
    && adduser --disabled-password --gecos "" "${USERNAME}" \
    && echo "${USERNAME}:${USERNAME}" | chpasswd \
    && echo "%${USERNAME}    ALL=(ALL)   NOPASSWD:    ALL" >> /etc/sudoers.d/${USERNAME} \
    && chmod 0440 /etc/sudoers.d/${USERNAME}

RUN mkdir -p /checkpoint/${USERNAME}/INFER \
    && chown -R ${USERNAME}:${USERNAME} /checkpoint/${USERNAME}

USER ${USERNAME}
WORKDIR /usr/src/app/fairseq
CMD [ "python", "examples/mms/asr/infer/mms_infer.py" ]

Building with:

docker build -t fairseq:dev -f Dockerfile.mms .

Running with:

docker run --rm -it --gpus all -e USER=user -v $(pwd):/mms:ro fairseq:dev python examples/mms/asr/infer/mms_infer.py --model /mms/examples/mms/mms1b_l1107.pt --lang fra --audio /mms/examples/mms/test16k.wav

Not sure what I missed but running this I ran into an error this error. Maybe its a quick permission issue? Apologies I don't work with Docker regularly.

>>> preparing tmp manifest dir ...
>>> loading model & running inference ...
Traceback (most recent call last):
 File "/usr/lib/python3.8/pathlib.py", line 1288, in mkdir
   self._accessor.mkdir(self, mode)
**FileNotFoundError: [Errno 2] No such file or directory: '/checkpoint/user/INFER/None'**

During handling of the above exception, another exception occurred:

I edited the script and it's now working for me, with an Ubuntu 22.04 image, tested with both CUDA 11.8 and 12.1. Note that I added permissions for /checkpoint/${USERNAME}.

Dockerfile.mms:

# Also works with CUDA 12.1:
#FROM nvidia/cuda:12.1.1-cudnn8-runtime-ubuntu22.04
FROM nvidia/cuda:11.8.0-cudnn8-runtime-ubuntu22.04

ENV DEBIAN_FRONTEND=noninteractive
WORKDIR /usr/src/app

RUN apt-get update \
    && apt-get install -y python-is-python3 git python3-pip sudo wget curl

RUN git clone https://github.com/facebookresearch/fairseq.git \
    && cd fairseq \
    && pip install pip -U \
    && pip install --no-cache-dir . \
    && pip install --no-cache-dir soundfile \
    && pip install --no-cache-dir torch \
    && pip install --no-cache-dir hydra-core \
    && pip install --no-cache-dir editdistance \
    && pip install --no-cache-dir soundfile \
    && pip install --no-cache-dir omegaconf \
    && pip install --no-cache-dir scikit-learn \
    && pip install --no-cache-dir tensorboardX \
    && python setup.py build_ext --inplace

ENV USERNAME=user
RUN echo "root:root" | chpasswd \
    && adduser --disabled-password --gecos "" "${USERNAME}" \
    && echo "${USERNAME}:${USERNAME}" | chpasswd \
    && echo "%${USERNAME}    ALL=(ALL)   NOPASSWD:    ALL" >> /etc/sudoers.d/${USERNAME} \
    && chmod 0440 /etc/sudoers.d/${USERNAME}

RUN mkdir -p /checkpoint/${USERNAME}/INFER \
    && chown -R ${USERNAME}:${USERNAME} /checkpoint/${USERNAME}

USER ${USERNAME}
WORKDIR /usr/src/app/fairseq
CMD [ "python", "examples/mms/asr/infer/mms_infer.py" ]

Building with:

docker build -t fairseq:dev -f Dockerfile.mms .

Running with:

docker run --rm -it --gpus all -e USER=user -v $(pwd):/mms:ro fairseq:dev python examples/mms/asr/infer/mms_infer.py --model /mms/examples/mms/mms1b_l1107.pt --lang fra --audio /mms/examples/mms/test16k.wav

Worked for me thanks! For the ones not proficient with docker, just make sure to create on a directory where your docker file is located a directory /examples/mms and place your model and audio files in that directory. What this line $(pwd):/mms:ro does is it mounts the current directory (the present working directory) as a read-only volume inside the container at the path /mms.

Hi all, If someone is still struggling to run the code, I tried to create a Python package to easily use the MMS project, instead of calling subprocess and dealing with yaml files. `Hope it will be useful! :)

Hi all, If someone is still struggling to run the code, I tried to create a Python package to easily use the MMS project, instead of calling subprocess and dealing with yaml files. `Hope it will be useful! :)

I get the following error after following all the steps:

---------------------------------------------------------------------------
ModuleNotFoundError                       Traceback (most recent call last)
File /scriptur/nemo_asr/env/lib/python3.8/site-packages/easymms/models/alignment.py:16
     15 try:
---> 16     from fairseq.examples.mms.data_prep.align_and_segment import get_alignments
     17     from fairseq.examples.mms.data_prep.align_utils import get_uroman_tokens, get_spans

ModuleNotFoundError: No module named 'fairseq.examples.mms'

During handling of the above exception, another exception occurred:

ModuleNotFoundError                       Traceback (most recent call last)
File /scriptur/nemo_asr/env/lib/python3.8/site-packages/easymms/models/alignment.py:21
     20 try:
---> 21     from examples.mms.data_prep.align_and_segment import get_alignments
     22     from examples.mms.data_prep.align_utils import get_uroman_tokens, get_spans

ModuleNotFoundError: No module named 'examples.mms'

During handling of the above exception, another exception occurred:

ModuleNotFoundError                       Traceback (most recent call last)
Cell In[6], line 1
----> 1 from easymms.models.asr import ASRModel
      3 asr = ASRModel(model='/bekarys/fairseq/models/mms1b_fl102.pt')
      4 files = val_data_annotated.audio_path.to_list()[:2]

File /scriptur/nemo_asr/env/lib/python3.8/site-packages/easymms/models/asr.py:37
     35 from easymms import utils
     36 from easymms._logger import set_log_level
---> 37 from easymms.models.alignment import AlignmentModel
     38 from easymms.constants import CFG, HYPO_WORDS_FILE, MMS_LANGS_FILE
     40 logger = logging.getLogger(__name__)

File /scriptur/nemo_asr/env/lib/python3.8/site-packages/easymms/models/alignment.py:27
     25 import fairseq
     26 sys.path.append(str(Path(fairseq.__file__).parent))
---> 27 from fairseq.examples.mms.data_prep.align_and_segment import get_alignments
     28 from fairseq.examples.mms.data_prep.align_utils import get_uroman_tokens, get_spans
     29 from fairseq.examples.mms.data_prep.text_normalization import text_normalize

ModuleNotFoundError: No module named 'fairseq.examples.mms'

Hi all, If someone is still struggling to run the code, I tried to create a Python package to easily use the MMS project, instead of calling subprocess and dealing with yaml files. `Hope it will be useful! :)

I get the following error after following all the steps:

---------------------------------------------------------------------------
ModuleNotFoundError                       Traceback (most recent call last)
File /scriptur/nemo_asr/env/lib/python3.8/site-packages/easymms/models/alignment.py:16
     15 try:
---> 16     from fairseq.examples.mms.data_prep.align_and_segment import get_alignments
     17     from fairseq.examples.mms.data_prep.align_utils import get_uroman_tokens, get_spans

ModuleNotFoundError: No module named 'fairseq.examples.mms'

During handling of the above exception, another exception occurred:

ModuleNotFoundError                       Traceback (most recent call last)
File /scriptur/nemo_asr/env/lib/python3.8/site-packages/easymms/models/alignment.py:21
     20 try:
---> 21     from examples.mms.data_prep.align_and_segment import get_alignments
     22     from examples.mms.data_prep.align_utils import get_uroman_tokens, get_spans

ModuleNotFoundError: No module named 'examples.mms'

During handling of the above exception, another exception occurred:

ModuleNotFoundError                       Traceback (most recent call last)
Cell In[6], line 1
----> 1 from easymms.models.asr import ASRModel
      3 asr = ASRModel(model='/bekarys/fairseq/models/mms1b_fl102.pt')
      4 files = val_data_annotated.audio_path.to_list()[:2]

File /scriptur/nemo_asr/env/lib/python3.8/site-packages/easymms/models/asr.py:37
     35 from easymms import utils
     36 from easymms._logger import set_log_level
---> 37 from easymms.models.alignment import AlignmentModel
     38 from easymms.constants import CFG, HYPO_WORDS_FILE, MMS_LANGS_FILE
     40 logger = logging.getLogger(__name__)

File /scriptur/nemo_asr/env/lib/python3.8/site-packages/easymms/models/alignment.py:27
     25 import fairseq
     26 sys.path.append(str(Path(fairseq.__file__).parent))
---> 27 from fairseq.examples.mms.data_prep.align_and_segment import get_alignments
     28 from fairseq.examples.mms.data_prep.align_utils import get_uroman_tokens, get_spans
     29 from fairseq.examples.mms.data_prep.text_normalization import text_normalize

ModuleNotFoundError: No module named 'fairseq.examples.mms'

I just noticed that the MMS project is not included yet in the released version of fairseq, so you will need to install it from source until then:

pip uninstall fairseq && pip install git+https://github.com/facebookresearch/fairseq

The installation steps are updated accordingly. Let me know @bekarys0504 if that solved the issue ?

Hi all, If someone is still struggling to run the code, I tried to create a Python package to easily use the MMS project, instead of calling subprocess and dealing with yaml files. `Hope it will be useful! :)

I get the following error after following all the steps:

---------------------------------------------------------------------------
ModuleNotFoundError                       Traceback (most recent call last)
File /scriptur/nemo_asr/env/lib/python3.8/site-packages/easymms/models/alignment.py:16
     15 try:
---> 16     from fairseq.examples.mms.data_prep.align_and_segment import get_alignments
     17     from fairseq.examples.mms.data_prep.align_utils import get_uroman_tokens, get_spans

ModuleNotFoundError: No module named 'fairseq.examples.mms'

During handling of the above exception, another exception occurred:

ModuleNotFoundError                       Traceback (most recent call last)
File /scriptur/nemo_asr/env/lib/python3.8/site-packages/easymms/models/alignment.py:21
     20 try:
---> 21     from examples.mms.data_prep.align_and_segment import get_alignments
     22     from examples.mms.data_prep.align_utils import get_uroman_tokens, get_spans

ModuleNotFoundError: No module named 'examples.mms'

During handling of the above exception, another exception occurred:

ModuleNotFoundError                       Traceback (most recent call last)
Cell In[6], line 1
----> 1 from easymms.models.asr import ASRModel
      3 asr = ASRModel(model='/bekarys/fairseq/models/mms1b_fl102.pt')
      4 files = val_data_annotated.audio_path.to_list()[:2]

File /scriptur/nemo_asr/env/lib/python3.8/site-packages/easymms/models/asr.py:37
     35 from easymms import utils
     36 from easymms._logger import set_log_level
---> 37 from easymms.models.alignment import AlignmentModel
     38 from easymms.constants import CFG, HYPO_WORDS_FILE, MMS_LANGS_FILE
     40 logger = logging.getLogger(__name__)

File /scriptur/nemo_asr/env/lib/python3.8/site-packages/easymms/models/alignment.py:27
     25 import fairseq
     26 sys.path.append(str(Path(fairseq.__file__).parent))
---> 27 from fairseq.examples.mms.data_prep.align_and_segment import get_alignments
     28 from fairseq.examples.mms.data_prep.align_utils import get_uroman_tokens, get_spans
     29 from fairseq.examples.mms.data_prep.text_normalization import text_normalize

ModuleNotFoundError: No module named 'fairseq.examples.mms'

I just noticed that the MMS project is not included yet in the released version of fairseq, so you will need to install it from source until then:

pip uninstall fairseq && pip install git+https://github.com/facebookresearch/fairseq

The installation steps are updated accordingly. Let me know @bekarys0504 if that solved the issue ?

I have the following error now :( @abdeladim-s

Traceback (most recent call last):
  File "/scriptur/nemo_asr/env/lib/python3.8/site-packages/IPython/core/interactiveshell.py", line 3505, in run_code
    exec(code_obj, self.user_global_ns, self.user_ns)
  File "/tmp/ipykernel_2570058/32768016.py", line 6, in <module>
    transcriptions = asr.transcribe(files, lang='kaz', align=False)
  File "/scriptur/nemo_asr/env/lib/python3.8/site-packages/easymms/models/asr.py", line 170, in transcribe
    self.wer = hydra_main(cfg)
  File "/scriptur/nemo_asr/env/lib/python3.8/site-packages/hydra/main.py", line 27, in decorated_main
    return task_function(cfg_passthrough)
  File "/scriptur/nemo_asr/env/lib/python3.8/site-packages/fairseq/examples/speech_recognition/new/infer.py", line 436, in hydra_main
  File "/scriptur/nemo_asr/env/lib/python3.8/site-packages/fairseq/distributed/utils.py", line 369, in call_main
    if cfg.distributed_training.distributed_init_method is None:
  File "/scriptur/nemo_asr/env/lib/python3.8/site-packages/fairseq/examples/speech_recognition/new/infer.py", line 383, in main
  File "/scriptur/nemo_asr/env/lib/python3.8/site-packages/fairseq/examples/speech_recognition/new/infer.py", line 103, in __init__
  File "/scriptur/nemo_asr/env/lib/python3.8/site-packages/fairseq/examples/speech_recognition/new/infer.py", line 205, in load_model_ensemble
    out_file.write(line)
  File "/scriptur/nemo_asr/env/lib/python3.8/site-packages/fairseq/checkpoint_utils.py", line 367, in load_model_ensemble
    arg_overrides (Dict[str,Any], optional): override model args that
  File "/scriptur/nemo_asr/env/lib/python3.8/site-packages/fairseq/checkpoint_utils.py", line 482, in load_model_ensemble_and_task
  File "/scriptur/nemo_asr/env/lib/python3.8/site-packages/fairseq/models/fairseq_model.py", line 128, in load_state_dict
    return super().load_state_dict(new_state_dict, strict)
  File "/scriptur/nemo_asr/env/lib/python3.8/site-packages/torch/nn/modules/module.py", line 2056, in load_state_dict
    raise RuntimeError('Error(s) in loading state_dict for {}:\n\t{}'.format(
RuntimeError: Error(s) in loading state_dict for Wav2VecCtc:
    Unexpected key(s) in state_dict: "w2v_encoder.w2v_model.encoder.layers.0.adapter_layer.W_a", "w2v_encoder.w2v_model.encoder.layers.0.adapter_layer.W_b", "w2v_encoder.w2v_model.encoder.layers.0.adapter_layer.b_a", "w2v_encoder.w2v_model.encoder.layers.0.adapter_layer.b_b", "w2v_encoder.w2v_model.encoder.layers.0.adapter_layer.ln_W", "w2v_encoder.w2v_model.encoder.layers.0.adapter_layer.ln_b", "w2v_encoder.w2v_model.encoder.layers.1.adapter_layer.W_a", "w2v_encoder.w2v_model.encoder.layers.1.adapter_layer.W_b", "w2v_encoder.w2v_model.encoder.layers.1.adapter_layer.b_a", "w2v_encoder.w2v_model.encoder.layers.1.adapter_layer.b_b", "w2v_encoder.w2v_model.encoder.layers.1.adapter_layer.ln_W", "w2v_encoder.w2v_model.encoder.layers.1.adapter_layer.ln_b", "w2v_encoder.w2v_model.encoder.layers.2.adapter_layer.W_a", "w2v_encoder.w2v_model.encoder.layers.2.adapter_layer.W_b", "w2v_encoder.w2v_model.encoder.layers.2.adapter_layer.b_a", "w2v_encoder.w2v_model.encoder.layers.2.adapter_layer.b_b", "w2v_encoder.w2v_model.encoder.layers.2.adapter_layer.ln_W", "w2v_encoder.w2v_model.encoder.layers.2.adapter_layer.ln_b", "w2v_encoder.w2v_model.encoder.layers.3.adapter_layer.W_a", "w2v_encoder.w2v_model.encoder.layers.3.adapter_layer.W_b", "w2v_encoder.w2v_model.encoder.layers.3.adapter_layer.b_a", "w2v_encoder.w2v_model.encoder.layers.3.adapter_layer.b_b", "w2v_encoder.w2v_model.encoder.layers.3.adapter_layer.ln_W", "w2v_encoder.w2v_model.encoder.layers.3.adapter_layer.ln_b", "w2v_encoder.w2v_model.encoder.layers.4.adapter_layer.W_a", "w2v_encoder.w2v_model.encoder.layers.4.adapter_layer.W_b", "w2v_encoder.w2v_model.encoder.layers.4.adapter_layer.b_a", "w2v_encoder.w2v_model.encoder.layers.4.adapter_layer.b_b", "w2v_encoder.w2v_model.encoder.layers.4.adapter_layer.ln_W", "w2v_encoder.w2v_model.encoder.layers.4.adapter_layer.ln_b", "w2v_encoder.w2v_model.encoder.layers.5.adapter_layer.W_a", "w2v_encoder.w2v_model.encoder.layers.5.adapter_layer.W_b", "w2v_encoder.w2v_model.encoder.layers.5.adapter_layer.b_a", "w2v_encoder.w2v_model.encoder.layers.5.adapter_layer.b_b", "w2v_encoder.w2v_model.encoder.layers.5.adapter_layer.ln_W", "w2v_encoder.w2v_model.encoder.layers.5.adapter_layer.ln_b", "w2v_encoder.w2v_model.encoder.layers.6.adapter_layer.W_a", "w2v_encoder.w2v_model.encoder.layers.6.adapter_layer.W_b", "w2v_encoder.w2v_model.encoder.layers.6.adapter_layer.b_a", "w2v_encoder.w2v_model.encoder.layers.6.adapter_layer.b_b", "w2v_encoder.w2v_model.encoder.layers.6.adapter_layer.ln_W", "w2v_encoder.w2v_model.encoder.layers.6.adapter_layer.ln_b", "w2v_encoder.w2v_model.encoder.layers.7.adapter_layer.W_a", "w2v_encoder.w2v_model.encoder.layers.7.adapter_layer.W_b", "w2v_encoder.w2v_model.encoder.layers.7.adapter_layer.b_a", "w2v_encoder.w2v_model.encoder.layers.7.adapter_layer.b_b", "w2v_encoder.w2v_model.encoder.layers.7.adapter_layer.ln_W", "w2v_encoder.w2v_model.encoder.layers.7.adapter_layer.ln_b", "w2v_encoder.w2v_model.encoder.layers.8.adapter_layer.W_a", "w2v_encoder.w2v_model.encoder.layers.8.adapter_layer.W_b", "w2v_encoder.w2v_model.encoder.layers.8.adapter_layer.b_a", "w2v_encoder.w2v_model.encoder.layers.8.adapter_layer.b_b", "w2v_encoder.w2v_model.encoder.layers.8.adapter_layer.ln_W", "w2v_encoder.w2v_model.encoder.layers.8.adapter_layer.ln_b", "w2v_encoder.w2v_model.encoder.layers.9.adapter_layer.W_a", "w2v_encoder.w2v_model.encoder.layers.9.adapter_layer.W_b", "w2v_encoder.w2v_model.encoder.layers.9.adapter_layer.b_a", "w2v_encoder.w2v_model.encoder.layers.9.adapter_layer.b_b", "w2v_encoder.w2v_model.encoder.layers.9.adapter_layer.ln_W", "w2v_encoder.w2v_model.encoder.layers.9.adapter_layer.ln_b", "w2v_encoder.w2v_model.encoder.layers.10.adapter_layer.W_a", "w2v_encoder.w2v_model.encoder.layers.10.adapter_layer.W_b", "w2v_encoder.w2v_model.encoder.layers.10.adapter_layer.b_a", "w2v_encoder.w2v_model.encoder.layers.10.adapter_layer.b_b", "w2v_encoder.w2v_model.encoder.layers.10.adapter_layer.ln_W", "w2v_encoder.w2v_model.encoder.layers.10.adapter_layer.ln_b", "w2v_encoder.w2v_model.encoder.layers.11.adapter_layer.W_a", "w2v_encoder.w2v_model.encoder.layers.11.adapter_layer.W_b", "w2v_encoder.w2v_model.encoder.layers.11.adapter_layer.b_a", "w2v_encoder.w2v_model.encoder.layers.11.adapter_layer.b_b", "w2v_encoder.w2v_model.encoder.layers.11.adapter_layer.ln_W", "w2v_encoder.w2v_model.encoder.layers.11.adapter_layer.ln_b", "w2v_encoder.w2v_model.encoder.layers.12.adapter_layer.W_a", "w2v_encoder.w2v_model.encoder.layers.12.adapter_layer.W_b", "w2v_encoder.w2v_model.encoder.layers.12.adapter_layer.b_a", "w2v_encoder.w2v_model.encoder.layers.12.adapter_layer.b_b", "w2v_encoder.w2v_model.encoder.layers.12.adapter_layer.ln_W", "w2v_encoder.w2v_model.encoder.layers.12.adapter_layer.ln_b", "w2v_encoder.w2v_model.encoder.layers.13.adapter_layer.W_a", "w2v_encoder.w2v_model.encoder.layers.13.adapter_layer.W_b", "w2v_encoder.w2v_model.encoder.layers.13.adapter_layer.b_a", "w2v_encoder.w2v_model.encoder.layers.13.adapter_layer.b_b", "w2v_encoder.w2v_model.encoder.layers.13.adapter_layer.ln_W", "w2v_encoder.w2v_model.encoder.layers.13.adapter_layer.ln_b", "w2v_encoder.w2v_model.encoder.layers.14.adapter_layer.W_a", "w2v_encoder.w2v_model.encoder.layers.14.adapter_layer.W_b", "w2v_encoder.w2v_model.encoder.layers.14.adapter_layer.b_a", "w2v_encoder.w2v_model.encoder.layers.14.adapter_layer.b_b", "w2v_encoder.w2v_model.encoder.layers.14.adapter_layer.ln_W", "w2v_encoder.w2v_model.encoder.layers.14.adapter_layer.ln_b", "w2v_encoder.w2v_model.encoder.layers.15.adapter_layer.W_a", "w2v_encoder.w2v_model.encoder.layers.15.adapter_layer.W_b", "w2v_encoder.w2v_model.encoder.layers.15.adapter_layer.b_a", "w2v_encoder.w2v_model.encoder.layers.15.adapter_layer.b_b", "w2v_encoder.w2v_model.encoder.layers.15.adapter_layer.ln_W", "w2v_encoder.w2v_model.encoder.layers.15.adapter_layer.ln_b", "w2v_encoder.w2v_model.encoder.layers.16.adapter_layer.W_a", "w2v_encoder.w2v_model.encoder.layers.16.adapter_layer.W_b", "w2v_encoder.w2v_model.encoder.layers.16.adapter_layer.b_a", "w2v_encoder.w2v_model.encoder.layers.16.adapter_layer.b_b", "w2v_encoder.w2v_model.encoder.layers.16.adapter_layer.ln_W", "w2v_encoder.w2v_model.encoder.layers.16.adapter_layer.ln_b", "w2v_encoder.w2v_model.encoder.layers.17.adapter_layer.W_a", "w2v_encoder.w2v_model.encoder.layers.17.adapter_layer.W_b", "w2v_encoder.w2v_model.encoder.layers.17.adapter_layer.b_a", "w2v_encoder.w2v_model.encoder.layers.17.adapter_layer.b_b", "w2v_encoder.w2v_model.encoder.layers.17.adapter_layer.ln_W", "w2v_encoder.w2v_model.encoder.layers.17.adapter_layer.ln_b", "w2v_encoder.w2v_model.encoder.layers.18.adapter_layer.W_a", "w2v_encoder.w2v_model.encoder.layers.18.adapter_layer.W_b", "w2v_encoder.w2v_model.encoder.layers.18.adapter_layer.b_a", "w2v_encoder.w2v_model.encoder.layers.18.adapter_layer.b_b", "w2v_encoder.w2v_model.encoder.layers.18.adapter_layer.ln_W", "w2v_encoder.w2v_model.encoder.layers.18.adapter_layer.ln_b", "w2v_encoder.w2v_model.encoder.layers.19.adapter_layer.W_a", "w2v_encoder.w2v_model.encoder.layers.19.adapter_layer.W_b", "w2v_encoder.w2v_model.encoder.layers.19.adapter_layer.b_a", "w2v_encoder.w2v_model.encoder.layers.19.adapter_layer.b_b", "w2v_encoder.w2v_model.encoder.layers.19.adapter_layer.ln_W", "w2v_encoder.w2v_model.encoder.layers.19.adapter_layer.ln_b", "w2v_encoder.w2v_model.encoder.layers.20.adapter_layer.W_a", "w2v_encoder.w2v_model.encoder.layers.20.adapter_layer.W_b", "w2v_encoder.w2v_model.encoder.layers.20.adapter_layer.b_a", "w2v_encoder.w2v_model.encoder.layers.20.adapter_layer.b_b", "w2v_encoder.w2v_model.encoder.layers.20.adapter_layer.ln_W", "w2v_encoder.w2v_model.encoder.layers.20.adapter_layer.ln_b", "w2v_encoder.w2v_model.encoder.layers.21.adapter_layer.W_a", "w2v_encoder.w2v_model.encoder.layers.21.adapter_layer.W_b", "w2v_encoder.w2v_model.encoder.layers.21.adapter_layer.b_a", "w2v_encoder.w2v_model.encoder.layers.21.adapter_layer.b_b", "w2v_encoder.w2v_model.encoder.layers.21.adapter_layer.ln_W", "w2v_encoder.w2v_model.encoder.layers.21.adapter_layer.ln_b", "w2v_encoder.w2v_model.encoder.layers.22.adapter_layer.W_a", "w2v_encoder.w2v_model.encoder.layers.22.adapter_layer.W_b", "w2v_encoder.w2v_model.encoder.layers.22.adapter_layer.b_a", "w2v_encoder.w2v_model.encoder.layers.22.adapter_layer.b_b", "w2v_encoder.w2v_model.encoder.layers.22.adapter_layer.ln_W", "w2v_encoder.w2v_model.encoder.layers.22.adapter_layer.ln_b", "w2v_encoder.w2v_model.encoder.layers.23.adapter_layer.W_a", "w2v_encoder.w2v_model.encoder.layers.23.adapter_layer.W_b", "w2v_encoder.w2v_model.encoder.layers.23.adapter_layer.b_a", "w2v_encoder.w2v_model.encoder.layers.23.adapter_layer.b_b", "w2v_encoder.w2v_model.encoder.layers.23.adapter_layer.ln_W", "w2v_encoder.w2v_model.encoder.layers.23.adapter_layer.ln_b", "w2v_encoder.w2v_model.encoder.layers.24.adapter_layer.W_a", "w2v_encoder.w2v_model.encoder.layers.24.adapter_layer.W_b", "w2v_encoder.w2v_model.encoder.layers.24.adapter_layer.b_a", "w2v_encoder.w2v_model.encoder.layers.24.adapter_layer.b_b", "w2v_encoder.w2v_model.encoder.layers.24.adapter_layer.ln_W", "w2v_encoder.w2v_model.encoder.layers.24.adapter_layer.ln_b", "w2v_encoder.w2v_model.encoder.layers.25.adapter_layer.W_a", "w2v_encoder.w2v_model.encoder.layers.25.adapter_layer.W_b", "w2v_encoder.w2v_model.encoder.layers.25.adapter_layer.b_a", "w2v_encoder.w2v_model.encoder.layers.25.adapter_layer.b_b", "w2v_encoder.w2v_model.encoder.layers.25.adapter_layer.ln_W", "w2v_encoder.w2v_model.encoder.layers.25.adapter_layer.ln_b", "w2v_encoder.w2v_model.encoder.layers.26.adapter_layer.W_a", "w2v_encoder.w2v_model.encoder.layers.26.adapter_layer.W_b", "w2v_encoder.w2v_model.encoder.layers.26.adapter_layer.b_a", "w2v_encoder.w2v_model.encoder.layers.26.adapter_layer.b_b", "w2v_encoder.w2v_model.encoder.layers.26.adapter_layer.ln_W", "w2v_encoder.w2v_model.encoder.layers.26.adapter_layer.ln_b", "w2v_encoder.w2v_model.encoder.layers.27.adapter_layer.W_a", "w2v_encoder.w2v_model.encoder.layers.27.adapter_layer.W_b", "w2v_encoder.w2v_model.encoder.layers.27.adapter_layer.b_a", "w2v_encoder.w2v_model.encoder.layers.27.adapter_layer.b_b", "w2v_encoder.w2v_model.encoder.layers.27.adapter_layer.ln_W", "w2v_encoder.w2v_model.encoder.layers.27.adapter_layer.ln_b", "w2v_encoder.w2v_model.encoder.layers.28.adapter_layer.W_a", "w2v_encoder.w2v_model.encoder.layers.28.adapter_layer.W_b", "w2v_encoder.w2v_model.encoder.layers.28.adapter_layer.b_a", "w2v_encoder.w2v_model.encoder.layers.28.adapter_layer.b_b", "w2v_encoder.w2v_model.encoder.layers.28.adapter_layer.ln_W", "w2v_encoder.w2v_model.encoder.layers.28.adapter_layer.ln_b", "w2v_encoder.w2v_model.encoder.layers.29.adapter_layer.W_a", "w2v_encoder.w2v_model.encoder.layers.29.adapter_layer.W_b", "w2v_encoder.w2v_model.encoder.layers.29.adapter_layer.b_a", "w2v_encoder.w2v_model.encoder.layers.29.adapter_layer.b_b", "w2v_encoder.w2v_model.encoder.layers.29.adapter_layer.ln_W", "w2v_encoder.w2v_model.encoder.layers.29.adapter_layer.ln_b", "w2v_encoder.w2v_model.encoder.layers.30.adapter_layer.W_a", "w2v_encoder.w2v_model.encoder.layers.30.adapter_layer.W_b", "w2v_encoder.w2v_model.encoder.layers.30.adapter_layer.b_a", "w2v_encoder.w2v_model.encoder.layers.30.adapter_layer.b_b", "w2v_encoder.w2v_model.encoder.layers.30.adapter_layer.ln_W", "w2v_encoder.w2v_model.encoder.layers.30.adapter_layer.ln_b", "w2v_encoder.w2v_model.encoder.layers.31.adapter_layer.W_a", "w2v_encoder.w2v_model.encoder.layers.31.adapter_layer.W_b", "w2v_encoder.w2v_model.encoder.layers.31.adapter_layer.b_a", "w2v_encoder.w2v_model.encoder.layers.31.adapter_layer.b_b", "w2v_encoder.w2v_model.encoder.layers.31.adapter_layer.ln_W", "w2v_encoder.w2v_model.encoder.layers.31.adapter_layer.ln_b", "w2v_encoder.w2v_model.encoder.layers.32.adapter_layer.W_a", "w2v_encoder.w2v_model.encoder.layers.32.adapter_layer.W_b", "w2v_encoder.w2v_model.encoder.layers.32.adapter_layer.b_a", "w2v_encoder.w2v_model.encoder.layers.32.adapter_layer.b_b", "w2v_encoder.w2v_model.encoder.layers.32.adapter_layer.ln_W", "w2v_encoder.w2v_model.encoder.layers.32.adapter_layer.ln_b", "w2v_encoder.w2v_model.encoder.layers.33.adapter_layer.W_a", "w2v_encoder.w2v_model.encoder.layers.33.adapter_layer.W_b", "w2v_encoder.w2v_model.encoder.layers.33.adapter_layer.b_a", "w2v_encoder.w2v_model.encoder.layers.33.adapter_layer.b_b", "w2v_encoder.w2v_model.encoder.layers.33.adapter_layer.ln_W", "w2v_encoder.w2v_model.encoder.layers.33.adapter_layer.ln_b", "w2v_encoder.w2v_model.encoder.layers.34.adapter_layer.W_a", "w2v_encoder.w2v_model.encoder.layers.34.adapter_layer.W_b", "w2v_encoder.w2v_model.encoder.layers.34.adapter_layer.b_a", "w2v_encoder.w2v_model.encoder.layers.34.adapter_layer.b_b", "w2v_encoder.w2v_model.encoder.layers.34.adapter_layer.ln_W", "w2v_encoder.w2v_model.encoder.layers.34.adapter_layer.ln_b", "w2v_encoder.w2v_model.encoder.layers.35.adapter_layer.W_a", "w2v_encoder.w2v_model.encoder.layers.35.adapter_layer.W_b", "w2v_encoder.w2v_model.encoder.layers.35.adapter_layer.b_a", "w2v_encoder.w2v_model.encoder.layers.35.adapter_layer.b_b", "w2v_encoder.w2v_model.encoder.layers.35.adapter_layer.ln_W", "w2v_encoder.w2v_model.encoder.layers.35.adapter_layer.ln_b", "w2v_encoder.w2v_model.encoder.layers.36.adapter_layer.W_a", "w2v_encoder.w2v_model.encoder.layers.36.adapter_layer.W_b", "w2v_encoder.w2v_model.encoder.layers.36.adapter_layer.b_a", "w2v_encoder.w2v_model.encoder.layers.36.adapter_layer.b_b", "w2v_encoder.w2v_model.encoder.layers.36.adapter_layer.ln_W", "w2v_encoder.w2v_model.encoder.layers.36.adapter_layer.ln_b", "w2v_encoder.w2v_model.encoder.layers.37.adapter_layer.W_a", "w2v_encoder.w2v_model.encoder.layers.37.adapter_layer.W_b", "w2v_encoder.w2v_model.encoder.layers.37.adapter_layer.b_a", "w2v_encoder.w2v_model.encoder.layers.37.adapter_layer.b_b", "w2v_encoder.w2v_model.encoder.layers.37.adapter_layer.ln_W", "w2v_encoder.w2v_model.encoder.layers.37.adapter_layer.ln_b", "w2v_encoder.w2v_model.encoder.layers.38.adapter_layer.W_a", "w2v_encoder.w2v_model.encoder.layers.38.adapter_layer.W_b", "w2v_encoder.w2v_model.encoder.layers.38.adapter_layer.b_a", "w2v_encoder.w2v_model.encoder.layers.38.adapter_layer.b_b", "w2v_encoder.w2v_model.encoder.layers.38.adapter_layer.ln_W", "w2v_encoder.w2v_model.encoder.layers.38.adapter_layer.ln_b", "w2v_encoder.w2v_model.encoder.layers.39.adapter_layer.W_a", "w2v_encoder.w2v_model.encoder.layers.39.adapter_layer.W_b", "w2v_encoder.w2v_model.encoder.layers.39.adapter_layer.b_a", "w2v_encoder.w2v_model.encoder.layers.39.adapter_layer.b_b", "w2v_encoder.w2v_model.encoder.layers.39.adapter_layer.ln_W", "w2v_encoder.w2v_model.encoder.layers.39.adapter_layer.ln_b", "w2v_encoder.w2v_model.encoder.layers.40.adapter_layer.W_a", "w2v_encoder.w2v_model.encoder.layers.40.adapter_layer.W_b", "w2v_encoder.w2v_model.encoder.layers.40.adapter_layer.b_a", "w2v_encoder.w2v_model.encoder.layers.40.adapter_layer.b_b", "w2v_encoder.w2v_model.encoder.layers.40.adapter_layer.ln_W", "w2v_encoder.w2v_model.encoder.layers.40.adapter_layer.ln_b", "w2v_encoder.w2v_model.encoder.layers.41.adapter_layer.W_a", "w2v_encoder.w2v_model.encoder.layers.41.adapter_layer.W_b", "w2v_encoder.w2v_model.encoder.layers.41.adapter_layer.b_a", "w2v_encoder.w2v_model.encoder.layers.41.adapter_layer.b_b", "w2v_encoder.w2v_model.encoder.layers.41.adapter_layer.ln_W", "w2v_encoder.w2v_model.encoder.layers.41.adapter_layer.ln_b", "w2v_encoder.w2v_model.encoder.layers.42.adapter_layer.W_a", "w2v_encoder.w2v_model.encoder.layers.42.adapter_layer.W_b", "w2v_encoder.w2v_model.encoder.layers.42.adapter_layer.b_a", "w2v_encoder.w2v_model.encoder.layers.42.adapter_layer.b_b", "w2v_encoder.w2v_model.encoder.layers.42.adapter_layer.ln_W", "w2v_encoder.w2v_model.encoder.layers.42.adapter_layer.ln_b", "w2v_encoder.w2v_model.encoder.layers.43.adapter_layer.W_a", "w2v_encoder.w2v_model.encoder.layers.43.adapter_layer.W_b", "w2v_encoder.w2v_model.encoder.layers.43.adapter_layer.b_a", "w2v_encoder.w2v_model.encoder.layers.43.adapter_layer.b_b", "w2v_encoder.w2v_model.encoder.layers.43.adapter_layer.ln_W", "w2v_encoder.w2v_model.encoder.layers.43.adapter_layer.ln_b", "w2v_encoder.w2v_model.encoder.layers.44.adapter_layer.W_a", "w2v_encoder.w2v_model.encoder.layers.44.adapter_layer.W_b", "w2v_encoder.w2v_model.encoder.layers.44.adapter_layer.b_a", "w2v_encoder.w2v_model.encoder.layers.44.adapter_layer.b_b", "w2v_encoder.w2v_model.encoder.layers.44.adapter_layer.ln_W", "w2v_encoder.w2v_model.encoder.layers.44.adapter_layer.ln_b", "w2v_encoder.w2v_model.encoder.layers.45.adapter_layer.W_a", "w2v_encoder.w2v_model.encoder.layers.45.adapter_layer.W_b", "w2v_encoder.w2v_model.encoder.layers.45.adapter_layer.b_a", "w2v_encoder.w2v_model.encoder.layers.45.adapter_layer.b_b", "w2v_encoder.w2v_model.encoder.layers.45.adapter_layer.ln_W", "w2v_encoder.w2v_model.encoder.layers.45.adapter_layer.ln_b", "w2v_encoder.w2v_model.encoder.layers.46.adapter_layer.W_a", "w2v_encoder.w2v_model.encoder.layers.46.adapter_layer.W_b", "w2v_encoder.w2v_model.encoder.layers.46.adapter_layer.b_a", "w2v_encoder.w2v_model.encoder.layers.46.adapter_layer.b_b", "w2v_encoder.w2v_model.encoder.layers.46.adapter_layer.ln_W", "w2v_encoder.w2v_model.encoder.layers.46.adapter_layer.ln_b", "w2v_encoder.w2v_model.encoder.layers.47.adapter_layer.W_a", "w2v_encoder.w2v_model.encoder.layers.47.adapter_layer.W_b", "w2v_encoder.w2v_model.encoder.layers.47.adapter_layer.b_a", "w2v_encoder.w2v_model.encoder.layers.47.adapter_layer.b_b", "w2v_encoder.w2v_model.encoder.layers.47.adapter_layer.ln_W", "w2v_encoder.w2v_model.encoder.layers.47.adapter_layer.ln_b". 

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/scriptur/nemo_asr/env/lib/python3.8/site-packages/IPython/core/interactiveshell.py", line 2102, in showtraceback
    stb = self.InteractiveTB.structured_traceback(
  File "/scriptur/nemo_asr/env/lib/python3.8/site-packages/IPython/core/ultratb.py", line 1310, in structured_traceback
    return FormattedTB.structured_traceback(
  File "/scriptur/nemo_asr/env/lib/python3.8/site-packages/IPython/core/ultratb.py", line 1199, in structured_traceback
    return VerboseTB.structured_traceback(
  File "/scriptur/nemo_asr/env/lib/python3.8/site-packages/IPython/core/ultratb.py", line 1052, in structured_traceback
    formatted_exception = self.format_exception_as_a_whole(etype, evalue, etb, number_of_lines_of_context,
  File "/scriptur/nemo_asr/env/lib/python3.8/site-packages/IPython/core/ultratb.py", line 978, in format_exception_as_a_whole
    frames.append(self.format_record(record))
  File "/scriptur/nemo_asr/env/lib/python3.8/site-packages/IPython/core/ultratb.py", line 878, in format_record
    frame_info.lines, Colors, self.has_colors, lvals
  File "/scriptur/nemo_asr/env/lib/python3.8/site-packages/IPython/core/ultratb.py", line 712, in lines
    return self._sd.lines
  File "/scriptur/nemo_asr/env/lib/python3.8/site-packages/stack_data/utils.py", line 144, in cached_property_wrapper
    value = obj.__dict__[self.func.__name__] = self.func(obj)
  File "/scriptur/nemo_asr/env/lib/python3.8/site-packages/stack_data/core.py", line 734, in lines
    pieces = self.included_pieces
  File "/scriptur/nemo_asr/env/lib/python3.8/site-packages/stack_data/utils.py", line 144, in cached_property_wrapper
    value = obj.__dict__[self.func.__name__] = self.func(obj)
  File "/scriptur/nemo_asr/env/lib/python3.8/site-packages/stack_data/core.py", line 681, in included_pieces
    pos = scope_pieces.index(self.executing_piece)
  File "/scriptur/nemo_asr/env/lib/python3.8/site-packages/stack_data/utils.py", line 144, in cached_property_wrapper
    value = obj.__dict__[self.func.__name__] = self.func(obj)
  File "/scriptur/nemo_asr/env/lib/python3.8/site-packages/stack_data/core.py", line 660, in executing_piece
    return only(
  File "/scriptur/nemo_asr/env/lib/python3.8/site-packages/executing/executing.py", line 190, in only
    raise NotOneValueFound('Expected one value, found 0')
executing.executing.NotOneValueFound: Expected one value, found 0

@bekarys0504, what model are you using ? I think you are using a wrong model!

@bekarys0504, what model are you using ? I think you are using a wrong model!

this one mms1b_fl102.pt downloaded through this link https://dl.fbaipublicfiles.com/mms/asr/mms1b_fl102.pt

Should be the right one it is for ASR @abdeladim-s

@bekarys0504, what model are you using ? I think you are using a wrong model!

this one mms1b_fl102.pt downloaded through this link https://dl.fbaipublicfiles.com/mms/asr/mms1b_fl102.pt

Should be the right one it is for ASR @abdeladim-s

@bekarys0504 , yes it is a right model it seems. Could you please submit an issue on the project repo so we can debug this issue further together ?

Ok for anyone who still have the FileNotFoundError: [Errno 2] No such file or directory for the hypo.word error and just want to test the inference:

Its really what the error says. :D While inference the programming accesses the tmp folder and need to write some file. Including the hypo.word. As the error says: in line 44 of the mms_infer.py the it trys to open and write the hypo.word. with open(tmpdir/"hypo.word") as fr: as you can see no righs are defined for the open method are defined. so just give python the right to write and read the file. with open(tmpdir/"hypo.word", "w+") as fr: this should be all.

you can see in the code
def process(args):    
    with tempfile.TemporaryDirectory() as tmpdir:
        print(">>> preparing tmp manifest dir ...", file=sys.stderr)
        tmpdir = Path("/home/divisio/projects/tmp/")
        with open(tmpdir / "dev.tsv", "w") as fw:
            fw.write("/\n")
            for audio in args.audio:
                nsample = sf.SoundFile(audio).frames
                fw.write(f"{audio}\t{nsample}\n")
        with open(tmpdir / "dev.uid", "w") as fw:
            fw.write(f"{audio}\n"*len(args.audio))
        with open(tmpdir / "dev.ltr", "w") as fw:
            fw.write("d u m m y | d u m m y\n"*len(args.audio))
        with open(tmpdir / "dev.wrd", "w") as fw:
            fw.write("dummy dummy\n"*len(args.audio))
        cmd = f"""
        PYTHONPATH=. PREFIX=INFER HYDRA_FULL_ERROR=1 python examples/speech_recognition/new/infer.py -m --config-dir examples/mms/asr/config/ --config-name infer_common decoding.type=viterbi dataset.max_tokens=4000000 distributed_training.distributed_world_size=1 "common_eval.path='{args.model}'" task.data={tmpdir} dataset.gen_subset="{args.lang}:dev" common_eval.post_process={args.format} decoding.results_path={tmpdir}
        """
        print(">>> loading model & running inference ...", file=sys.stderr)
        subprocess.run(cmd, shell=True, stdout=subprocess.DEVNULL,)
        with open(tmpdir/"hypo.word", "w+") as fr:
            for ii, hypo in enumerate(fr):
                hypo = re.sub("$\S+$$", "", hypo).strip()
                print(f'===============\nInput: {args.audio[ii]}\nOutput: {hypo}')
python should already created tthe files dev.tsv, dev.uid, dev.ltr and dev.wrd in the same tmp folder. If you want to check this, simply change the

tmpdir = Path(tmpdir) in to a static folder for instance in your user directory like tmpdir = Path("/home/myuser/path/to/my/project/test")

and you will see that those file will bne created. including the hypo.word if you did the changes like I discribed before.

now the the examples/speech_recognition/new/infer.py will be triggerd in line 40. and it might fail writing down the inference log file. like @v-yunbin discribed `FileNotFoundError: [Errno 2] No such file or directory: '/checkpoint/.../INFER/None'

and its again just a problem with permissions to write some files. next to the mms_infer.py file is a config folder including a infer_common.yaml and there is the property
hydra:
  run:
    dir: ${common_eval.results_path}/${dataset.gen_subset}
  sweep:
    dir: /checkpoint/${env:USER}/${env:PREFIX}/${common_eval.results_path}
    subdir: ${dataset.gen_subset}
so it trys to write in to the checkpoint folder on root level. If you can not do that. simply change tthis folder my some folder in your user folder: for instance
hydra:
  run:
    dir: ${common_eval.results_path}/${dataset.gen_subset}
  sweep:
    dir: /home/myuser/my/project/folder/tmp/${env:USER}/${env:PREFIX}/${common_eval.results_path}
    subdir: ${dataset.gen_subset}
so now the script will have access to those folders and whrite down the inferece log (infer.log) in to that folder which includes the result of the ASR.

I did what you described and while it ran for 6 minutes, I get a "Killed" in the output with no other information. The RAM was basically maxed out throughout and there was no hypo.word not found error. The model is probably just too big to run on free Colab.

How much resources does it really take to run the l1107 model anyways? Because running it on colab maxed out 12GB of system RAM. Feels really overkill for a 10 second audio input.

It takes less than 8GB with the code snippet of https://huggingface.co/facebook/mms-1b-all and can easily be run on CPU - give it a try ;-)

It takes less than 8GB with the code snippet of https://huggingface.co/facebook/mms-1b-all and can easily be run on CPU - give it a try ;-)

That's good to know B), because even after tweaking the line where asr.py was supposed to do stuff to hypo.word it ran on Colab but after 6 minutes of maxing out the 12GB of RAM it was killed. The audio file wasn't even long, it was less than 10 seconds long.

By the way I have yet to try it using 🤗 transformers, I'm referring to the colab notebook demoing ASR that's having trouble running.

I also found this error "hypo.word" on another machine (Ubuntu 20.04) while there is no problem in the other (Ubuntu 22.04). Actually there is an error before No such file or directory: /tmp/hypo.word:

ValueError: numpy.ndarray size changed, may indicate binary incompatibility. Expected 96 from C header, got 88 from PyObject

After updating Numpy (from 1.21.5 to 1.24.3) the error was gone and the the output of ASR is shown in the bottom.

missing modules, one of them being omegaconf

@altryne you need to print the error output to debug

Yes you are right. Smaller model is working 😓

To fix this issue, add open(tmpdir/"hypo.word", 'w').close() before the line 48 in "fairseq\examples\mms\asr\infer\mms_infer.py"

what files need to be changed in windows

BTW, it should now be very simple to use MMS with transformers:

See:

https://huggingface.co/docs/transformers/main/en/model_doc/mms

[MMS] Scaling Speech Technology to 1,000+ Languages | Add attention adapter to Wav2Vec2 huggingface/transformers#23813

https://huggingface.co/facebook/mms-1b-all

your project is perfect,but i want to know how to use my own voice to translate

Please y'all read the error messages and try to debug yourself.

@dakouan18

ModuleNotFoundError: No module named 'omegaconf'

you need to install the missing modules, one of them being omegaconf

@altryne you need to print the error output to debug

@shsagnik your hydra install has some issues, and you need to specify a checkpoint directory, it was setup to run on linux where you can make directories off the root (probably in a container) so change infer_common.yaml

I have the same error as @shsagnik
What should I do ? I ran on ubuntu

Sorry to bother you here. I am unable to run mms asr transcribe . I am using python3.11 and facing a range of issues from hypo.word not found, AttributeError: 'PosixPath' object has no attribute 'find' and what not.

Going through the issues, there is no landing solutions except lot of comments.. https://github.com/facebookresearch/fairseq/issues/5284 (already tried that solution and lead to my error posted below) https://github.com/facebookresearch/fairseq/issues/5117 (has no solution)

There are just way too many threads relating to mms asr transcribe issues but no working solutions posted, if there is one set of installation instructions that actually work and are documented somewhere that be great.

Here is my error:

os.environ["TMPDIR"] ='/Users/spanta/Downloads/fairseq-main/temp_dir'

os.environ["PYTHONPATH"] = "."

os.environ["PREFIX"] = "INFER"

os.environ["HYDRA_FULL_ERROR"] = "1"

os.environ["USER"] = "micro"

os.system('python3.11 examples/mms/asr/infer/mms_infer.py --model "/Users/spanta/Downloads/fairseq/models_new/mms1b_fl102.pt" --lang "tel" --audio "/Users/spanta/Documents/test_wav/1.wav"')

preparing tmp manifest dir ...

loading model & running inference ...

/Users/spanta/Downloads/fairseq-main/examples/speech_recognition/new/infer.py:440: UserWarning:

The version_base parameter is not specified.

Please specify a compatability version level, or None.

Will assume defaults for version 1.1

@hydra.main(config_path=config_path, config_name="infer")

Traceback (most recent call last):

File "/Users/spanta/Downloads/fairseq-main/examples/speech_recognition/new/infer.py", line 499, in

cli_main()

File "/Users/spanta/Downloads/fairseq-main/examples/speech_recognition/new/infer.py", line 495, in cli_main

hydra_main()  # pylint: disable=no-value-for-parameter

^^^^^^^^^^^^

File "/opt/homebrew/lib/python3.11/site-packages/hydra/main.py", line 94, in decorated_main

_run_hydra(

File "/opt/homebrew/lib/python3.11/site-packages/hydra/_internal/utils.py", line 355, in _run_hydra

hydra = run_and_report(

        ^^^^^^^^^^^^^^^

File "/opt/homebrew/lib/python3.11/site-packages/hydra/_internal/utils.py", line 223, in run_and_report

raise ex

File "/opt/homebrew/lib/python3.11/site-packages/hydra/_internal/utils.py", line 220, in run_and_report

return func()

       ^^^^^^

File "/opt/homebrew/lib/python3.11/site-packages/hydra/_internal/utils.py", line 356, in

lambda: Hydra.create_main_hydra2(

        ^^^^^^^^^^^^^^^^^^^^^^^^^

File "/opt/homebrew/lib/python3.11/site-packages/hydra/_internal/hydra.py", line 61, in create_main_hydra2

config_loader: ConfigLoader = ConfigLoaderImpl(

                              ^^^^^^^^^^^^^^^^^

File "/opt/homebrew/lib/python3.11/site-packages/hydra/_internal/config_loader_impl.py", line 48, in init

self.repository = ConfigRepository(config_search_path=config_search_path)

                  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

File "/opt/homebrew/lib/python3.11/site-packages/hydra/_internal/config_repository.py", line 65, in init

self.initialize_sources(config_search_path)

File "/opt/homebrew/lib/python3.11/site-packages/hydra/_internal/config_repository.py", line 72, in initialize_sources

scheme = self._get_scheme(search_path.path)

         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

File "/opt/homebrew/lib/python3.11/site-packages/hydra/_internal/config_repository.py", line 143, in _get_scheme

idx = path.find("://")

      ^^^^^^^^^

AttributeError: 'PosixPath' object has no attribute 'find'

Thanks @audiolion It wasn't immediately clear that mms_infer.py calls the whole hydra thing via a command, as it obscures the errors that pop up there. Here's the full output I'm getting (added a print out of the cmd command as well)

$ python examples/mms/asr/infer/mms_infer.py --model mms1b_l1107.pt --audio output_audio.mp3 --lang tur
>>> preparing tmp manifest dir ...

        PYTHONPATH=. PREFIX=INFER HYDRA_FULL_ERROR=1 python examples/speech_recognition/new/infer.py -m --config-dir examples/mms/asr/config/ --config-name i
                                                                                                                                           infer_common decoding.type=viterbi dataset.max_tokens=4000000 distributed_training.distributed_world_size=1 "common_eval.path='mms1b_l1107.pt'" task.data=C:\Users\micro\AppData\Local\Temp\tmpxzum3zve dataset.gen_subset="tur:dev" common_eval.post_process=letter decoding.results_path=C:\Users\micro\AppData\Local\Temmp\tmpxzum3zve

>>> loading model & running inference ...
Traceback (most recent call last):
  File "C:\Users\micro\projects\mms\examples\mms\asr\infer\mms_infer.py", line 53, in <module>
    process(args)
  File "C:\Users\micro\projects\mms\examples\mms\asr\infer\mms_infer.py", line 45, in process
    with open(tmpdir/"hypo.word") as fr:
         ^^^^^^^^^^^^^^^^^^^^^^^^
FileNotFoundError: [Errno 2] No such file or directory: 'C:\\Users\\micro\\AppData\\Local\\Temp\\tmpxzum3zve\\hypo.word'

you need to do what I said in my first comment and output the process error message. the hyp.word file is not found because the actual ASR never ran and produced an output

Hello, I output the error message according to your comment, and it printed the following error “CompletedProcess(args='\n PYTHONPATH=. PREFIX=INFER HYDRA_FULL_ERROR=1 python examples/speech_recognition/new/infer.py -m --config-dir examples/mms/asr/config/ --config-name infer_common decoding.type=viterbi dataset.max_tokens=1440000 distributed_training.distributed_world_size=1 "common_eval.path=\'./models_new/mms1b_all.pt\'" task.data=/tmp/tmpepozridd dataset.gen_subset="adx:dev" common_eval.post_process=letter decoding.results_path=/tmp/tmpepozridd \n ', returncode=1)” The complete log is as follows：微信图片_20240305211018

After series of tries, i was able to get it to infer on Linux but it could probably work on Windows also. The hypo.word file missing error is due to exceptions thrown during subprocess.run(cmd, shell=True, stdout=subprocess.DEVNULL,) so first i suggest you replace that line with the following:
out = subprocess.run(cmd, check=True, shell=True, stdout=subprocess.DEVNULL, )
print(out)
This will enable you see whats causing the error. Also provide the full paths of your model and audio files like this python examples/mms/asr/infer/mms_infer.py --model "/home/hunter/Downloads/mms1b_all.pt" --lang eng --audio "/home/hunter/Downloads/audio.wav"

After I replaced the code, the error message output is as follows：

CompletedProcess(args='\n        PYTHONPATH=. PREFIX=INFER HYDRA_FULL_ERROR=1 python examples/speech_recognition/new/infer.py -m --config-dir examples/mms/asr/config/ --config-name infer_common decoding.type=viterbi dataset.max_tokens=1440000 distributed_training.distributed_world_size=1 "common_eval.path=\'models_new/mms1b_all.pt\'" task.data=/tmp/tmpuarv2nsi dataset.gen_subset="adx:dev" common_eval.post_process=letter decoding.results_path=/tmp/tmpuarv2nsi \n        ', returncode=1)

The complete log is as follows：微信图片_20240310220025

Hi, I have tried many methods mentioned above, like use another model or check RAM occupation, they are all not working for me. Even if I modify the code in mms_infer.py like following, it still did not output any error. I am not sure if I missed anything. I got the error message without any reason:

CompletedProcess(args='\n PYTHONPATH=. PREFIX=INFER HYDRA_FULL_ERROR=1 python examples/speech_recognition/new/infer.py -m --config-dir examples/mms/asr/config/ --config-name infer_common decoding.type=viterbi dataset.max_tokens=1440000 distributed_training.distributed_world_size=1 "common_eval.path=\'./models_new/mms1b_fl102.pt\'" task.data=C:\Users\Yalla\AppData\Local\Temp\tmpv5nntp35 dataset.gen_subset="eng:dev" common_eval.post_process=letter decoding.results_path=C:\Users\Yalla\AppData\Local\Temp\tmpv5nntp35 \n ', returncode=0) >>>preparing tmp manifest dir ... >>>loading model & running inference ... Traceback (most recent call last): File "fairseq/examples/mms/asr/infer/mms_infer.py", line 64, in process(args) File "fairseq/examples/mms/asr/infer/mms_infer.py", line 54, in process with open(tmpdir/"hypo.word") as fr: FileNotFoundError: [Errno 2] No such file or directory: 'C:\Users\Yalla\AppData\Local\Temp\tmpv5nntp35\hypo.word'

facebookresearch / fairseq