facebookresearch / fairseq

Facebook AI Research Sequence-to-Sequence Toolkit written in Python.
MIT License
30.22k stars 6.38k forks source link

hypo.word file missing during MMS ASR inference #5117

Open ahazeemi opened 1 year ago

ahazeemi commented 1 year ago

❓ Questions and Help

What is your question?

I'm facing the following issue while running the MMS ASR inference script examples/mms/asr/infer/mms_infer.py:

  File "/workspace/fairseq/examples/mms/asr/infer/mms_infer.py", line 52, in <module>
    process(args)
  File "/workspace/fairseq/examples/mms/asr/infer/mms_infer.py", line 44, in process
    with open(tmpdir/"hypo.word") as fr:
FileNotFoundError: [Errno 2] No such file or directory: '/workspace/tmpsjatjyxt/hypo.word'

Code

python examples/mms/asr/infer/mms_infer.py --model "/workspace/fairseq/mms1b_fl102.pt" --lang "urd-script_arabic" --audio "/workspace/audio.wav"

What have you tried?

Tried running the ASR on different audios and languages

What's your environment?

shsagnik commented 1 year ago

Facing the exact same issue

vineelpratap commented 1 year ago

Hi, can you share the entire log? I just tested the code again and it works fine from my end.

audiolion commented 1 year ago

you need to check what the error is, change your mms_infer.py to

out = subprocess.run(cmd, check=True, shell=True, stdout=subprocess.DEVNULL,)
print(out)

to see the error, for me it was I needed to pass cpu=True because I don't have CUDA installed. I did this by modifying my infer_common.yml file to have a new top level key common with the cpu: true key/val in it

common:
  cpu: true
audiolion commented 1 year ago

I am hitting this though and I am not sure what I am doing wrong, not sure if I am using the right lang_code, it doesn't say what the lang codes are or what standard it is referencing, I have tried en and en-US so far.

image
shsagnik commented 1 year ago

Sure here is the full log of mine

(base) hello_automate_ai@machinelearningnotebook:~/fairseqmmstest/fairseq$ python "examples/mms/asr/infer/mms_infer.py" --model "/home/hello_automate_ai/fairseqmmstest/mms1b_all.pt" --lang hin --audio "/home/hello_automate_ai/fairseqmmstest/audio.wav" preparing tmp manifest dir ... loading model & running inference ... Traceback (most recent call last): File "/home/hello_automate_ai/fairseqmmstest/fairseq/examples/speech_recognition/new/infer.py", line 18, in import editdistance ModuleNotFoundError: No module named 'editdistance' Traceback (most recent call last): File "/home/hello_automate_ai/fairseqmmstest/fairseq/examples/mms/asr/infer/mms_infer.py", line 52, in process(args) File "/home/hello_automate_ai/fairseqmmstest/fairseq/examples/mms/asr/infer/mms_infer.py", line 44, in process with open(tmpdir/"hypo.word") as fr: FileNotFoundError: [Errno 2] No such file or directory: '/tmp/tmp6u8grbxl/hypo.word'

shsagnik commented 1 year ago

This is after the fix suggested by audiolion

vineelpratap commented 1 year ago

@audiolion We expect a 3-digit language code. See 'Supported languages' section in README file for each model. For example - use 'eng' for English.

vineelpratap commented 1 year ago

@shsagnik No module named 'editdistance' - You should install the missing module.

audiolion commented 1 year ago

@shsagnik

ModuleNotFoundError: No module named 'editdistance'

you need to install the modules that are used

shsagnik commented 1 year ago

Got these errors this time

preparing tmp manifest dir ... loading model & running inference ... /home/hello_automate_ai/miniconda3/lib/python3.10/site-packages/hydra/core/plugins.py:202: UserWarning: Error importing 'hydra_plugins.hydra_colorlog'. Plugin is incompatible with this Hydra version or buggy. Recommended to uninstall or upgrade plugin. ImportError : cannot import name 'SearchPathPlugin' from 'hydra.plugins' (/home/hello_automate_ai/miniconda3/lib/python3.10/site-packages/hydra/plugins/init.py) warnings.warn( Traceback (most recent call last): File "/home/hello_automate_ai/miniconda3/lib/python3.10/pathlib.py", line 1175, in mkdir self._accessor.mkdir(self, mode) FileNotFoundError: [Errno 2] No such file or directory: '/checkpoint/hello_automate_ai/INFER/None'

During handling of the above exception, another exception occurred:

Traceback (most recent call last): File "/home/hello_automate_ai/miniconda3/lib/python3.10/pathlib.py", line 1175, in mkdir self._accessor.mkdir(self, mode) FileNotFoundError: [Errno 2] No such file or directory: '/checkpoint/hello_automate_ai/INFER'

During handling of the above exception, another exception occurred:

Traceback (most recent call last): File "/home/hello_automate_ai/miniconda3/lib/python3.10/pathlib.py", line 1175, in mkdir self._accessor.mkdir(self, mode) FileNotFoundError: [Errno 2] No such file or directory: '/checkpoint/hello_automate_ai'

During handling of the above exception, another exception occurred:

Traceback (most recent call last): File "/home/hello_automate_ai/fairseqmmstest/fairseq/examples/speech_recognition/new/infer.py", line 499, in cli_main() File "/home/hello_automate_ai/fairseqmmstest/fairseq/examples/speech_recognition/new/infer.py", line 495, in cli_main hydra_main() # pylint: disable=no-value-for-parameter File "/home/hello_automate_ai/miniconda3/lib/python3.10/site-packages/hydra/main.py", line 32, in decorated_main _run_hydra( File "/home/hello_automate_ai/miniconda3/lib/python3.10/site-packages/hydra/_internal/utils.py", line 354, in _run_hydra run_and_report( File "/home/hello_automate_ai/miniconda3/lib/python3.10/site-packages/hydra/_internal/utils.py", line 201, in run_and_report raise ex File "/home/hello_automate_ai/miniconda3/lib/python3.10/site-packages/hydra/_internal/utils.py", line 198, in run_and_report return func() File "/home/hello_automate_ai/miniconda3/lib/python3.10/site-packages/hydra/_internal/utils.py", line 355, in lambda: hydra.multirun( File "/home/hello_automate_ai/miniconda3/lib/python3.10/site-packages/hydra/_internal/hydra.py", line 136, in multirun return sweeper.sweep(arguments=task_overrides) File "/home/hello_automate_ai/miniconda3/lib/python3.10/site-packages/hydra/_internal/core_plugins/basic_sweeper.py", line 140, in sweep sweep_dir.mkdir(parents=True, exist_ok=True) File "/home/hello_automate_ai/miniconda3/lib/python3.10/pathlib.py", line 1179, in mkdir self.parent.mkdir(parents=True, exist_ok=True) File "/home/hello_automate_ai/miniconda3/lib/python3.10/pathlib.py", line 1179, in mkdir self.parent.mkdir(parents=True, exist_ok=True) File "/home/hello_automate_ai/miniconda3/lib/python3.10/pathlib.py", line 1179, in mkdir self.parent.mkdir(parents=True, exist_ok=True) File "/home/hello_automate_ai/miniconda3/lib/python3.10/pathlib.py", line 1175, in mkdir self._accessor.mkdir(self, mode) PermissionError: [Errno 13] Permission denied: '/checkpoint' Traceback (most recent call last): File "/home/hello_automate_ai/fairseqmmstest/fairseq/examples/mms/asr/infer/mms_infer.py", line 52, in process(args) File "/home/hello_automate_ai/fairseqmmstest/fairseq/examples/mms/asr/infer/mms_infer.py", line 44, in process with open(tmpdir/"hypo.word") as fr: FileNotFoundError: [Errno 2] No such file or directory: '/tmp/tmp0mcwde4n/hypo.word'

altryne commented 1 year ago

Getting pretty much the same, used the right 3 letter language code (while waiting on #5119 to be answered) and doesn't seem to have an effect, hypo.word error is showing up

dakouan18 commented 1 year ago

I got this error when i want to try ASR on google colab

/content/fairseq
>>> preparing tmp manifest dir ...
>>> loading model & running inference ...
Traceback (most recent call last):
  File "/content/fairseq/examples/speech_recognition/new/infer.py", line 21, in <module>
    from examples.speech_recognition.new.decoders.decoder_config import (
  File "/content/fairseq/examples/speech_recognition/__init__.py", line 1, in <module>
    from . import criterions, models, tasks  # noqa
  File "/content/fairseq/examples/speech_recognition/criterions/__init__.py", line 15, in <module>
    importlib.import_module(
  File "/usr/lib/python3.10/importlib/__init__.py", line 126, in import_module
    return _bootstrap._gcd_import(name[level:], package, level)
  File "/content/fairseq/examples/speech_recognition/criterions/cross_entropy_acc.py", line 13, in <module>
    from fairseq import utils
  File "/content/fairseq/fairseq/__init__.py", line 20, in <module>
    from fairseq.distributed import utils as distributed_utils
  File "/content/fairseq/fairseq/distributed/__init__.py", line 7, in <module>
    from .fully_sharded_data_parallel import (
  File "/content/fairseq/fairseq/distributed/fully_sharded_data_parallel.py", line 10, in <module>
    from fairseq.dataclass.configs import DistributedTrainingConfig
  File "/content/fairseq/fairseq/dataclass/__init__.py", line 6, in <module>
    from .configs import FairseqDataclass
  File "/content/fairseq/fairseq/dataclass/configs.py", line 12, in <module>
    from omegaconf import II, MISSING
ModuleNotFoundError: No module named 'omegaconf'
CompletedProcess(args='\n        PYTHONPATH=. PREFIX=INFER HYDRA_FULL_ERROR=1 python examples/speech_recognition/new/infer.py -m --config-dir examples/mms/asr/config/ --config-name infer_common decoding.type=viterbi dataset.max_tokens=4000000 distributed_training.distributed_world_size=1 "common_eval.path=\'/content/mms1b_fl102.pt\'" task.data=/tmp/tmp79w8mawp dataset.gen_subset="eng:dev" common_eval.post_process=letter decoding.results_path=/tmp/tmp79w8mawp\n        ', returncode=1)
Traceback (most recent call last):
  File "/content/fairseq/examples/mms/asr/infer/mms_infer.py", line 53, in <module>
    process(args)
  File "/content/fairseq/examples/mms/asr/infer/mms_infer.py", line 45, in process
    with open(tmpdir/"hypo.word") as fr:
FileNotFoundError: [Errno 2] No such file or directory: '/tmp/tmp79w8mawp/hypo.word'
audiolion commented 1 year ago

Please y'all read the error messages and try to debug yourself.

@dakouan18

ModuleNotFoundError: No module named 'omegaconf'

you need to install the missing modules, one of them being omegaconf

@altryne you need to print the error output to debug

@shsagnik your hydra install has some issues, and you need to specify a checkpoint directory, it was setup to run on linux where you can make directories off the root (probably in a container) so change infer_common.yaml

image
altryne commented 1 year ago

Thanks @audiolion It wasn't immediately clear that mms_infer.py calls the whole hydra thing via a command, as it obscures the errors that pop up there.

Here's the full output I'm getting (added a print out of the cmd command as well)

$ python examples/mms/asr/infer/mms_infer.py --model mms1b_l1107.pt --audio output_audio.mp3 --lang tur
>>> preparing tmp manifest dir ...

        PYTHONPATH=. PREFIX=INFER HYDRA_FULL_ERROR=1 python examples/speech_recognition/new/infer.py -m --config-dir examples/mms/asr/config/ --config-name i
                                                                                                                                           infer_common decoding.type=viterbi dataset.max_tokens=4000000 distributed_training.distributed_world_size=1 "common_eval.path='mms1b_l1107.pt'" task.data=C:\Users\micro\AppData\Local\Temp\tmpxzum3zve dataset.gen_subset="tur:dev" common_eval.post_process=letter decoding.results_path=C:\Users\micro\AppData\Local\Temmp\tmpxzum3zve

>>> loading model & running inference ...
Traceback (most recent call last):
  File "C:\Users\micro\projects\mms\examples\mms\asr\infer\mms_infer.py", line 53, in <module>
    process(args)
  File "C:\Users\micro\projects\mms\examples\mms\asr\infer\mms_infer.py", line 45, in process
    with open(tmpdir/"hypo.word") as fr:
         ^^^^^^^^^^^^^^^^^^^^^^^^
FileNotFoundError: [Errno 2] No such file or directory: 'C:\\Users\\micro\\AppData\\Local\\Temp\\tmpxzum3zve\\hypo.word'
dakouan18 commented 1 year ago

hi @audiolion, after installing omegaconf & hydra a new error appeared

>>> preparing tmp manifest dir ...
>>> loading model & running inference ...
2023-05-22 22:22:29.307454: I tensorflow/core/platform/cpu_feature_guard.cc:182] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: AVX2 FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.
2023-05-22 22:22:30.440434: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Could not find TensorRT
Traceback (most recent call last):
  File "/content/fairseq/examples/speech_recognition/new/infer.py", line 21, in <module>
    from examples.speech_recognition.new.decoders.decoder_config import (
  File "/content/fairseq/examples/speech_recognition/__init__.py", line 1, in <module>
    from . import criterions, models, tasks  # noqa
  File "/content/fairseq/examples/speech_recognition/criterions/__init__.py", line 15, in <module>
    importlib.import_module(
  File "/usr/lib/python3.10/importlib/__init__.py", line 126, in import_module
    return _bootstrap._gcd_import(name[level:], package, level)
  File "/content/fairseq/examples/speech_recognition/criterions/cross_entropy_acc.py", line 13, in <module>
    from fairseq import utils
  File "/content/fairseq/fairseq/__init__.py", line 33, in <module>
    import fairseq.criterions  # noqa
  File "/content/fairseq/fairseq/criterions/__init__.py", line 18, in <module>
    (
TypeError: cannot unpack non-iterable NoneType object
CompletedProcess(args='\n        PYTHONPATH=. PREFIX=INFER HYDRA_FULL_ERROR=1 python examples/speech_recognition/new/infer.py -m --config-dir examples/mms/asr/config/ --config-name infer_common decoding.type=viterbi dataset.max_tokens=4000000 distributed_training.distributed_world_size=1 "common_eval.path=\'/content/mms1b_fl102.pt\'" task.data=/tmp/tmpk2ot70rk dataset.gen_subset="eng:dev" common_eval.post_process=letter decoding.results_path=/tmp/tmpk2ot70rk\n        ', returncode=1)
Traceback (most recent call last):
  File "/content/fairseq/examples/mms/asr/infer/mms_infer.py", line 53, in <module>
    process(args)
  File "/content/fairseq/examples/mms/asr/infer/mms_infer.py", line 45, in process
    with open(tmpdir/"hypo.word") as fr:
FileNotFoundError: [Errno 2] No such file or directory: '/tmp/tmpk2ot70rk/hypo.word'
audiolion commented 1 year ago

Thanks @audiolion It wasn't immediately clear that mms_infer.py calls the whole hydra thing via a command, as it obscures the errors that pop up there.

Here's the full output I'm getting (added a print out of the cmd command as well)

$ python examples/mms/asr/infer/mms_infer.py --model mms1b_l1107.pt --audio output_audio.mp3 --lang tur
>>> preparing tmp manifest dir ...

        PYTHONPATH=. PREFIX=INFER HYDRA_FULL_ERROR=1 python examples/speech_recognition/new/infer.py -m --config-dir examples/mms/asr/config/ --config-name i
                                                                                                                                           infer_common decoding.type=viterbi dataset.max_tokens=4000000 distributed_training.distributed_world_size=1 "common_eval.path='mms1b_l1107.pt'" task.data=C:\Users\micro\AppData\Local\Temp\tmpxzum3zve dataset.gen_subset="tur:dev" common_eval.post_process=letter decoding.results_path=C:\Users\micro\AppData\Local\Temmp\tmpxzum3zve

>>> loading model & running inference ...
Traceback (most recent call last):
  File "C:\Users\micro\projects\mms\examples\mms\asr\infer\mms_infer.py", line 53, in <module>
    process(args)
  File "C:\Users\micro\projects\mms\examples\mms\asr\infer\mms_infer.py", line 45, in process
    with open(tmpdir/"hypo.word") as fr:
         ^^^^^^^^^^^^^^^^^^^^^^^^
FileNotFoundError: [Errno 2] No such file or directory: 'C:\\Users\\micro\\AppData\\Local\\Temp\\tmpxzum3zve\\hypo.word'

you need to do what I said in my first comment and output the process error message. the hyp.word file is not found because the actual ASR never ran and produced an output

altryne commented 1 year ago

SIGH, I am, it prints the command and that's it.

>>> loading model & running inference ...
CompletedProcess(args='\nPYTHONPATH=. PREFIX=INFER HYDRA_FULL_ERROR=1 python examples/speech_recognition/new/infer.py -m --config-dir examples/mms/asr/config/ --config-name infer_common                                                                                                                           decoding.type=viterbi dataset.max_tokens=4000000 distributed_training.distributed_world_size=1 "common_eval.path=\'mms1b_l1107.pt\'" task.data=C:\\Users\\micro\\AppData\\Local\\Temp\\tmp                                                                                                                         p9t2lty3_ dataset.gen_subset="tur:dev" common_eval.post_process=letter decoding.results_path=C:\\Users\\micro\\AppData\\Local\\Temp\\tmp9t2lty3_\n', returncode=0)
Traceback (most recent call last):
  File "C:\Users\micro\projects\mms\examples\mms\asr\infer\mms_infer.py", line 55, in <module>
    process(args)
  File "C:\Users\micro\projects\mms\examples\mms\asr\infer\mms_infer.py", line 47, in process
    with open(tmpdir/"hypo.word") as fr:
FileNotFoundError: [Errno 2] No such file or directory: 'C:\\Users\\micro\\AppData\\Local\\Temp\\tmp9t2lty3_\\hypo.word'

However, when I go back and recreate that temp dir, and run the command manually myself I do seem to get errors.

Just for some reason not via the way you mentioned.

Had to install many packages on the way, here's a partial list (in case it helps anyone)

pip install torch==1.9.0+cu111 -f https://download.pytorch.org/whl/cu111/torch_stable.html
pip install hydra-core
pip install editdistance
pip install soundfile
pip install omegaconf
pip install hydra-core
pip install fairseq
pip install scikit-learn
pip install tensorboardX

Still getting nowhere. Running the subprocess command even with check=True and printing the output returns status code 0 with no errors.

altryne commented 1 year ago

Got the model to finally load and run, apparently windows doesn't allow : in directory names and the above code adds :dev to the directory name.

So if you pass --lang tur like I did, it will try to create a directory named /tur:dev inside the /checkpoint which per @audiolion I also had to.. change as /checkpoint doesn't seem to do anything on windows.

I think the full inference ran, as the process got stuck for a few minutes, the GPU went to 8GB (impressive) and after a while, I had 2 errors again.

the hypo.word error seems to be a "catch all" error that means... many things that could go wrong, hopefully the authors will clean it up?

I'm currently staring at this error, and am pretty sure that's due to me removing the : from the dir name

  File "C:\Users\micro\projects\mms\examples\speech_recognition\new\infer.py", line 407, in main
    with InferenceProcessor(cfg) as processor:
  File "C:\Users\micro\projects\mms\examples\speech_recognition\new\infer.py", line 132, in __init__
    self.task.load_dataset(
  File "C:\Users\micro\projects\mms\fairseq\tasks\audio_finetuning.py", line 140, in load_dataset
    super().load_dataset(split, task_cfg, **kwargs)
  File "C:\Users\micro\projects\mms\fairseq\tasks\audio_pretraining.py", line 175, in load_dataset
    for key, file_name in data_keys:
ValueError: not enough values to unpack (expected 2, got 1)
bbz662 commented 1 year ago

I had the same error with Google Colab and have investigated.

my error

>>> preparing tmp manifest dir ...

        PYTHONPATH=. PREFIX=INFER HYDRA_FULL_ERROR=1 python examples/speech_recognition/new/infer.py -m --config-dir examples/mms/asr/config/ --config-name infer_common decoding.type=viterbi dataset.max_tokens=4000000 distributed_training.distributed_world_size=1 "common_eval.path='/content/mms1b_fl102.pt'" task.data=/content/tmp dataset.gen_subset="jpn:dev" common_eval.post_process=letter decoding.results_path=/content/tmp

>>> loading model & running inference ...
2023-05-22 22:02:52.055738: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Could not find TensorRT
[2023-05-22 22:02:58,730][HYDRA] Launching 1 jobs locally
[2023-05-22 22:02:58,730][HYDRA]    #0 : decoding.type=viterbi dataset.max_tokens=4000000 distributed_training.distributed_world_size=1 common_eval.path='/content/mms1b_fl102.pt' task.data=/content/tmp dataset.gen_subset=jpn:dev common_eval.post_process=letter decoding.results_path=/content/tmp
[2023-05-22 22:02:59,254][__main__][INFO] - /content/mms1b_fl102.pt
Killed
Traceback (most recent call last):
  File "/content/fairseq/examples/mms/asr/infer/mms_infer.py", line 54, in <module>
    process(args)
  File "/content/fairseq/examples/mms/asr/infer/mms_infer.py", line 46, in process
    with open(tmpdir/"hypo.word") as fr:
FileNotFoundError: [Errno 2] No such file or directory: '/content/tmp/hypo.word'

As it turns out, it was crashing at the following location.

https://github.com/facebookresearch/fairseq/blob/af12c9c6407bbcf2bca0b2f1923cf78f3db8857c/fairseq/models/wav2vec/wav2vec2.py#L1052

Looking at the RAM status, I believe the crash was caused by lack of memory. image

So I feel that perhaps increasing the memory will solve the problem.

I hope this helps you in your investigation.

betimd commented 1 year ago

Getting same error. Also documentation to run sample is horrible.

audiolion commented 1 year ago

I would say it isn't a catch all error, but rather that error handling from the subprocess call is not done, so if the call to run the inference fails for any reason, the hypo.word file will not have been created, and thus the open() call will fail and throw that error. So you have to dig backwards at the subprocess command to find out what happens. This just got open sourced so it makes sense there are some rough edges, contribute back to the repo!

edit: @altryne my bad I thought by your message you were printing the command out itself, not the output of running the command. Your error does look like its failing because of the lack of :. Good news is its open source so you could change : to another character, or run it on windows subsytem linux, or run it in docker.

altryne commented 1 year ago

I would say it isn't a catch all error, but rather that error handling from the subprocess call is not done, so if the call to run the inference fails for any reason, the hypo.word file will not have been created, and thus the open() call will fail and throw that error. So you have to dig backwards at the subprocess command to find out what happens. This just got open sourced so it makes sense there are some rough edges, contribute back to the repo!

Yeah, that's what I mean, if anything happens within the subprocess for any reason, folks are going to get the above mentioned error. Then they will likely google their way into this issue, which covers many of the possible ways it can fail. I was trying to be extra verbose for other folks to potentially help.

edit: @altryne my bad I thought by your message you were printing the command out itself, not the output of running the command. Your error does look like its failing because of the lack of :. Good news is its open source so you could change : to another character, or run it on windows subsytem linux, or run it in docker.

Thanks! You helped a lot, I eventually had to rewrite that whole block like so:

        import os
        os.environ["TMPDIR"] = str(tmpdir)
        os.environ["PYTHONPATH"] = "."
        os.environ["PREFIX"] = "INFER"
        os.environ["HYDRA_FULL_ERROR"] = "1"
        os.environ["USER"] = "micro"

        cmd = f"""python examples/speech_recognition/new/infer.py -m --config-dir examples/mms/asr/config/ --config-name infer_common decoding.type=viterbi dataset.max_tokens=4000000 distributed_training.distributed_world_size=1 "common_eval.path='{args.model}'" task.data={tmpdir} dataset.gen_subset="{args.lang}" common_eval.post_process={args.format} decoding.results_path={tmpdir}
"""

To even have the command execute and do something and not fail outright.

audiolion commented 1 year ago

glad you got it working!

fcecagno commented 1 year ago

Hi, thanks for this discussion - I've learned a lot. This is the Dockerfile I created after a few hours trying to make it work:

FROM python:3.8

WORKDIR /usr/src/app

COPY . .

RUN pip install --no-cache-dir . \
 && pip install --no-cache-dir soundfile \
 && pip install --no-cache-dir torch \
 && pip install --no-cache-dir hydra-core \
 && pip install --no-cache-dir editdistance \
 && pip install --no-cache-dir soundfile \
 && pip install --no-cache-dir omegaconf \
 && pip install --no-cache-dir scikit-learn \
 && pip install --no-cache-dir tensorboardX \
 && python setup.py build_ext --inplace \
 && apt update \
 && apt -y install libsndfile-dev \
 && rm -rf /var/lib/apt/lists/* \
 && wget https://github.com/mikefarah/yq/releases/latest/download/yq_linux_amd64 -O /usr/bin/yq \
 && chmod +x /usr/bin/yq \
 && yq -i '.common.cpu = true' examples/mms/asr/config/infer_common.yaml

CMD [ "python", "examples/mms/asr/infer/mms_infer.py" ]

I built the image with:

docker build -t fairseq:dev .

And run it with:

docker run --rm -it -e USER=root -v $(pwd):/mms:ro fairseq:dev python examples/mms/asr/infer/mms_infer.py --model /mms/mms1b_fl102.pt --lang eng --audio /mms/audio.wav
MohamedAliRashad commented 1 year ago

I kept tracing error and solving them until i met this error:


  File "/usr/lib/python3.8/importlib/__init__.py", line 127, in import_module
    return _bootstrap._gcd_import(name[level:], package, level)
  File "<frozen importlib._bootstrap>", line 1014, in _gcd_import
  File "<frozen importlib._bootstrap>", line 991, in _find_and_load
  File "<frozen importlib._bootstrap>", line 975, in _find_and_load_unlocked
  File "<frozen importlib._bootstrap>", line 657, in _load_unlocked
  File "<frozen importlib._bootstrap>", line 556, in module_from_spec
  File "<frozen importlib._bootstrap_external>", line 1166, in create_module
  File "<frozen importlib._bootstrap>", line 219, in _call_with_frames_removed
ImportError: /usr/local/lib/python3.8/dist-packages/fused_layer_norm_cuda.cpython-38-x86_64-linux-gnu.so: undefined symbol: _ZN3c106detail14torchCheckFailEPKcS2_jRKNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEE
Traceback (most recent call last):

Does anyone know a solution ?

didadida-r commented 1 year ago

Hi, thanks for this discussion - I've learned a lot. This is the Dockerfile I created after a few hours trying to make it work:

FROM python:3.8

WORKDIR /usr/src/app

COPY . .

RUN pip install --no-cache-dir . \
 && pip install --no-cache-dir soundfile \
 && pip install --no-cache-dir torch \
 && pip install --no-cache-dir hydra-core \
 && pip install --no-cache-dir editdistance \
 && pip install --no-cache-dir soundfile \
 && pip install --no-cache-dir omegaconf \
 && pip install --no-cache-dir scikit-learn \
 && pip install --no-cache-dir tensorboardX \
 && python setup.py build_ext --inplace \
 && apt update \
 && apt -y install libsndfile-dev \
 && rm -rf /var/lib/apt/lists/* \
 && wget https://github.com/mikefarah/yq/releases/latest/download/yq_linux_amd64 -O /usr/bin/yq \
 && chmod +x /usr/bin/yq \
 && yq -i '.common.cpu = true' examples/mms/asr/config/infer_common.yaml

CMD [ "python", "examples/mms/asr/infer/mms_infer.py" ]

I built the image with:

docker build -t fairseq:dev .

And run it with:

docker run --rm -it -e USER=root -v $(pwd):/mms:ro fairseq:dev python examples/mms/asr/infer/mms_infer.py --model /mms/mms1b_fl102.pt --lang eng --audio /mms/audio.wav

i run the code based on the docker, but it fails again

>>> preparing tmp manifest dir ...
>>> loading model & running inference ...
Traceback (most recent call last):
  File "examples/speech_recognition/new/infer.py", line 499, in <module>
    cli_main()
  File "examples/speech_recognition/new/infer.py", line 495, in cli_main
    hydra_main()  # pylint: disable=no-value-for-parameter
  File "/usr/local/lib/python3.8/site-packages/hydra/main.py", line 32, in decorated_main
    _run_hydra(
  File "/usr/local/lib/python3.8/site-packages/hydra/_internal/utils.py", line 354, in _run_hydra
    run_and_report(
  File "/usr/local/lib/python3.8/site-packages/hydra/_internal/utils.py", line 201, in run_and_report
    raise ex
  File "/usr/local/lib/python3.8/site-packages/hydra/_internal/utils.py", line 198, in run_and_report
    return func()
  File "/usr/local/lib/python3.8/site-packages/hydra/_internal/utils.py", line 355, in <lambda>
    lambda: hydra.multirun(
  File "/usr/local/lib/python3.8/site-packages/hydra/_internal/hydra.py", line 136, in multirun
    return sweeper.sweep(arguments=task_overrides)
  File "/usr/local/lib/python3.8/site-packages/hydra/_internal/core_plugins/basic_sweeper.py", line 154, in sweep
    results = self.launcher.launch(batch, initial_job_idx=initial_job_idx)
  File "/usr/local/lib/python3.8/site-packages/hydra/_internal/core_plugins/basic_launcher.py", line 76, in launch
    ret = run_job(
  File "/usr/local/lib/python3.8/site-packages/hydra/core/utils.py", line 129, in run_job
    ret.return_value = task_function(task_cfg)
  File "examples/speech_recognition/new/infer.py", line 460, in hydra_main
    distributed_utils.call_main(cfg, main)
  File "/usr/src/app/fairseq/distributed/utils.py", line 404, in call_main
    main(cfg, **kwargs)
  File "examples/speech_recognition/new/infer.py", line 407, in main
    with InferenceProcessor(cfg) as processor:
  File "examples/speech_recognition/new/infer.py", line 132, in __init__
    self.task.load_dataset(
  File "/usr/src/app/fairseq/tasks/audio_finetuning.py", line 140, in load_dataset
    super().load_dataset(split, task_cfg, **kwargs)
  File "/usr/src/app/fairseq/tasks/audio_pretraining.py", line 150, in load_dataset
    if task_cfg.multi_corpus_keys is None:
  File "/usr/local/lib/python3.8/site-packages/omegaconf/dictconfig.py", line 305, in __getattr__
    self._format_and_raise(key=key, value=None, cause=e)
  File "/usr/local/lib/python3.8/site-packages/omegaconf/base.py", line 95, in _format_and_raise
    format_and_raise(
  File "/usr/local/lib/python3.8/site-packages/omegaconf/_utils.py", line 629, in format_and_raise
    _raise(ex, cause)
  File "/usr/local/lib/python3.8/site-packages/omegaconf/_utils.py", line 610, in _raise
    raise ex  # set end OC_CAUSE=1 for full backtrace
  File "/usr/local/lib/python3.8/site-packages/omegaconf/dictconfig.py", line 303, in __getattr__
    return self._get_impl(key=key, default_value=DEFAULT_VALUE_MARKER)
  File "/usr/local/lib/python3.8/site-packages/omegaconf/dictconfig.py", line 361, in _get_impl
    node = self._get_node(key=key)
  File "/usr/local/lib/python3.8/site-packages/omegaconf/dictconfig.py", line 383, in _get_node
    self._validate_get(key)
  File "/usr/local/lib/python3.8/site-packages/omegaconf/dictconfig.py", line 135, in _validate_get
    self._format_and_raise(
  File "/usr/local/lib/python3.8/site-packages/omegaconf/base.py", line 95, in _format_and_raise
    format_and_raise(
  File "/usr/local/lib/python3.8/site-packages/omegaconf/_utils.py", line 694, in format_and_raise
    _raise(ex, cause)
  File "/usr/local/lib/python3.8/site-packages/omegaconf/_utils.py", line 610, in _raise
    raise ex  # set end OC_CAUSE=1 for full backtrace
omegaconf.errors.ConfigAttributeError: Key 'multi_corpus_keys' is not in struct
        full_key: task.multi_corpus_keys
        reference_type=Any
        object_type=dict
Traceback (most recent call last):
  File "examples/mms/asr/infer/mms_infer.py", line 52, in <module>
    process(args)
  File "examples/mms/asr/infer/mms_infer.py", line 44, in process
    with open(tmpdir/"hypo.word") as fr:
FileNotFoundError: [Errno 2] No such file or directory: '/tmp/tmp4o9kxdyr/hypo.word'
EklavyaFCB commented 1 year ago

Same error.

$ python examples/mms/asr/infer/mms_infer.py --model /idiap/temp/esarkar/cache/fairseq/mms1b_all.pt --lang shp --audio /idiap/temp/esarkar/Data/shipibo/downsampled_single_folder/short/shp-ROS-2022-03-14-2.1.wav

>>> preparing tmp manifest dir ...
>>> loading model & running inference ...
Traceback (most recent call last):
  File "/remote/idiap.svm/user.active/esarkar/speech/fairseq/examples/speech_recognition/new/infer.py", line 21, in <module>
    from examples.speech_recognition.new.decoders.decoder_config import (
  File "/remote/idiap.svm/user.active/esarkar/speech/fairseq/examples/speech_recognition/__init__.py", line 1, in <module>
    from . import criterions, models, tasks  # noqa
  File "/remote/idiap.svm/user.active/esarkar/speech/fairseq/examples/speech_recognition/criterions/__init__.py", line 15, in <module>
    importlib.import_module(
  File "/idiap/temp/esarkar/miniconda/envs/fairseq/lib/python3.9/importlib/__init__.py", line 127, in import_module
    return _bootstrap._gcd_import(name[level:], package, level)
  File "/remote/idiap.svm/user.active/esarkar/speech/fairseq/examples/speech_recognition/criterions/cross_entropy_acc.py", line 13, in <module>
    from fairseq import utils
  File "/remote/idiap.svm/user.active/esarkar/speech/fairseq/fairseq/__init__.py", line 33, in <module>
    import fairseq.criterions  # noqa
  File "/remote/idiap.svm/user.active/esarkar/speech/fairseq/fairseq/criterions/__init__.py", line 18, in <module>
    (
TypeError: cannot unpack non-iterable NoneType object
Traceback (most recent call last):
  File "/remote/idiap.svm/user.active/esarkar/speech/fairseq/examples/mms/asr/infer/mms_infer.py", line 52, in <module>
    process(args)
  File "/remote/idiap.svm/user.active/esarkar/speech/fairseq/examples/mms/asr/infer/mms_infer.py", line 44, in process
    with open(tmpdir/"hypo.word") as fr:
FileNotFoundError: [Errno 2] No such file or directory: '/idiap/temp/esarkar/tmp/tmpnhi5rrui/hypo.word'
hrishioa commented 1 year ago

Same issue.

python examples/mms/asr/infer/mms_infer.py --model "models/mms1b_fl102.pt" --lang eng --audio "../testscripts/audio.wav"
>>> preparing tmp manifest dir ...
>>> loading model & running inference ...
Traceback (most recent call last):
  File "~/fairseq/examples/speech_recognition/new/infer.py", line 21, in <module>
    from examples.speech_recognition.new.decoders.decoder_config import (
  File "~/fairseq/examples/__init__.py", line 7, in <module>
    from fairseq.version import __version__  # noqa
    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "~/fairseq/fairseq/__init__.py", line 20, in <module>
    from fairseq.distributed import utils as distributed_utils
  File "~/fairseq/fairseq/distributed/__init__.py", line 7, in <module>
    from .fully_sharded_data_parallel import (
  File "~/fairseq/fairseq/distributed/fully_sharded_data_parallel.py", line 10, in <module>
    from fairseq.dataclass.configs import DistributedTrainingConfig
  File "~/fairseq/fairseq/dataclass/__init__.py", line 6, in <module>
    from .configs import FairseqDataclass
  File "~/fairseq/fairseq/dataclass/configs.py", line 1127, in <module>
    @dataclass
     ^^^^^^^^^
  File "<location>/opt/anaconda3/envs/mms/lib/python3.11/dataclasses.py", line 1223, in dataclass
    return wrap(cls)
           ^^^^^^^^^
  File "<location>/opt/anaconda3/envs/mms/lib/python3.11/dataclasses.py", line 1213, in wrap
    return _process_class(cls, init, repr, eq, order, unsafe_hash,
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "<location>/opt/anaconda3/envs/mms/lib/python3.11/dataclasses.py", line 958, in _process_class
    cls_fields.append(_get_field(cls, name, type, kw_only))
                      ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "<location>/opt/anaconda3/envs/mms/lib/python3.11/dataclasses.py", line 815, in _get_field
    raise ValueError(f'mutable default {type(f.default)} for field '
ValueError: mutable default <class 'fairseq.dataclass.configs.CommonConfig'> for field common is not allowed: use default_factory
Traceback (most recent call last):
  File "~/fairseq/examples/mms/asr/infer/mms_infer.py", line 52, in <module>
    process(args)
  File "~/fairseq/examples/mms/asr/infer/mms_infer.py", line 44, in process
    with open(tmpdir/"hypo.word") as fr:
         ^^^^^^^^^^^^^^^^^^^^^^^^
FileNotFoundError: [Errno 2] No such file or directory: '/var/folders/7r/6k64fzpn6sx5ml6pb2h67kbw0000gn/T/tmp9ubxk363/hypo.word'
athenasaurav commented 1 year ago

Hello Everyone,

I have done all the remaining dependencies done. and when i m printing the output of error it shows Cannot Unpack NoneType object.

Here is the full log :

>>> preparing tmp manifest dir ...
>>> loading model & running inference ...
Traceback (most recent call last):
  File "examples/speech_recognition/new/infer.py", line 21, in <module>
    from examples.speech_recognition.new.decoders.decoder_config import (
  File "/root/VITS/fairseq/examples/speech_recognition/__init__.py", line 1, in <module>
    from . import criterions, models, tasks  # noqa
  File "/root/VITS/fairseq/examples/speech_recognition/criterions/__init__.py", line 15, in <module>
    importlib.import_module(
  File "/usr/lib/python3.8/importlib/__init__.py", line 127, in import_module
    return _bootstrap._gcd_import(name[level:], package, level)
  File "/root/VITS/fairseq/examples/speech_recognition/criterions/cross_entropy_acc.py", line 13, in <module>
    from fairseq import utils
  File "/root/VITS/fairseq/fairseq/__init__.py", line 33, in <module>
    import fairseq.criterions  # noqa
  File "/root/VITS/fairseq/fairseq/criterions/__init__.py", line 18, in <module>
    (
TypeError: cannot unpack non-iterable NoneType object
CompletedProcess(args='\n        PYTHONPATH=. PREFIX=INFER HYDRA_FULL_ERROR=1 python examples/speech_recognition/new/infer.py -m --config-dir examples/mms/asr/config/ --config-name infer_common decoding.type=viterbi dataset.max_tokens=4000000 distributed_training.distributed_world_size=1 "common_eval.path=\'mms1b_fl102.pt\'" task.data=/tmp/tmptsuf4ig2 dataset.gen_subset="hin:dev" common_eval.post_process=letter decoding.results_path=/tmp/tmptsuf4ig2\n        ', returncode=1)
Traceback (most recent call last):
  File "examples/mms/asr/infer/mms_infer.py", line 53, in <module>
    process(args)
  File "examples/mms/asr/infer/mms_infer.py", line 45, in process
    with open(tmpdir/"hypo. Word") as fr:

I m running on ubuntu 20.04

didadida-r commented 1 year ago

Hi, thanks for this discussion - I've learned a lot. This is the Dockerfile I created after a few hours trying to make it work:

FROM python:3.8

WORKDIR /usr/src/app

COPY . .

RUN pip install --no-cache-dir . \
 && pip install --no-cache-dir soundfile \
 && pip install --no-cache-dir torch \
 && pip install --no-cache-dir hydra-core \
 && pip install --no-cache-dir editdistance \
 && pip install --no-cache-dir soundfile \
 && pip install --no-cache-dir omegaconf \
 && pip install --no-cache-dir scikit-learn \
 && pip install --no-cache-dir tensorboardX \
 && python setup.py build_ext --inplace \
 && apt update \
 && apt -y install libsndfile-dev \
 && rm -rf /var/lib/apt/lists/* \
 && wget https://github.com/mikefarah/yq/releases/latest/download/yq_linux_amd64 -O /usr/bin/yq \
 && chmod +x /usr/bin/yq \
 && yq -i '.common.cpu = true' examples/mms/asr/config/infer_common.yaml

CMD [ "python", "examples/mms/asr/infer/mms_infer.py" ]

I built the image with:

docker build -t fairseq:dev .

And run it with:

docker run --rm -it -e USER=root -v $(pwd):/mms:ro fairseq:dev python examples/mms/asr/infer/mms_infer.py --model /mms/mms1b_fl102.pt --lang eng --audio /mms/audio.wav

Could you please run the code again, starting from git clone, Many thanks

- git clone https://github.com/facebookresearch/fairseq
- cd fairseq
- wget <model>
- docker build -t fairseq:dev .
- docker run --rm -it -e USER=root -v $(pwd):/mms:ro fairseq:dev python examples/mms/asr/infer/mms_infer.py --model /mms/mms1b_fl102.pt --lang eng --audio /mms/audio.wav
PINTO0309 commented 1 year ago

It worked without any problems. However, it appears that the sampling rate must be 16000.

MichaelMai2000 commented 1 year ago

@hrishioa

The dataclass error may be related to Python 3.11 (#5012). Switching back to Python 3.9.16 may resolve the problem.

I can run the inference after installing all packages at here and fixing the checkpoint path at here.

Also, I find that --model should specify the absolute path to the model file. A relative path like ../model.pt may cause an extra error.

hashefa commented 1 year ago

Can someone share a working colab notebook? I think it might help. Facing the same problems as above.

epk2112 commented 1 year ago

How to Transcribe Audio to text (Google Colab Version)πŸ‘‡

Step 1: Clone the Fairseq Git Repo

import os

!git clone https://github.com/pytorch/fairseq

# Get the current working directory
current_dir = os.getcwd()

# Create the directory paths
audio_samples_dir = os.path.join(current_dir, "audio_samples")
temp_dir = os.path.join(current_dir, "temp_dir")

# Create the directories if they don't exist
os.makedirs(audio_samples_dir, exist_ok=True)
os.makedirs(temp_dir, exist_ok=True)

# Change current working directory
os.chdir('fairseq')

!pwd

Step 2: Install requirements and build

Be patient, takes some minutes

!pip install --editable ./

Step 3: Install Tensor Board

!pip install tensorboardX

Step 4: Download your preferred model

Un-comment to download any. If you're not using Google Colab pro then use a smaller model to avoid memory outrage

# # MMS-1B:FL102 model - 102 Languages - FLEURS Dataset
# !wget -P ./models_new 'https://dl.fbaipublicfiles.com/mms/asr/mms1b_fl102.pt'

# # MMS-1B:L1107 - 1107 Languages - MMS-lab Dataset
# !wget -P ./models_new 'https://dl.fbaipublicfiles.com/mms/asr/mms1b_l1107.pt'

# MMS-1B-all - 1162 Languages - MMS-lab + FLEURS + CV + VP + MLS
!wget -P ./models_new 'https://dl.fbaipublicfiles.com/mms/asr/mms1b_all.pt'

Step 5: Upload your audio(s)

Create a folder on path '/content/audio_samples/' and upload your .wav audio files that you need to transcribe e.g. '/content/audio_samples/small_trim4.wav' Note: You need to make sure that the audio data you are using has a sample rate of 16000 You can easily do this with FFMPEG like the example below that converts .mp3 file to .wav and fixing the audio sample rate

ffmpeg -i .\small_trim4.mp3 -ar 16000 .\wav_formats\small_trim4.wav

Step 6: Run Inference and transcribe your audio(s)

Takes some time for long audios

import os

os.environ["TMPDIR"] = '/content/temp_dir'
os.environ["PYTHONPATH"] = "."
os.environ["PREFIX"] = "INFER"
os.environ["HYDRA_FULL_ERROR"] = "1"
os.environ["USER"] = "micro"

!python examples/mms/asr/infer/mms_infer.py --model "/content/fairseq/models_new/mms1b_all.pt" --lang "swh" --audio "/content/audio_samples/small_trim4.wav"

After this you'll get your preffered transcription I have this Collab Example in my GitHub RepoπŸ‘‰ fairseq_meta_mms_Google_Colab_implementation

MohamedAliRashad commented 1 year ago

It worked without any problems. However, it appears that the sampling rate must be 16000.

  • Dockerfile.mms

    FROM nvidia/cuda:11.8.0-cudnn8-runtime-ubuntu20.04
    # FROM ubuntu:20.04
    
    ENV DEBIAN_FRONTEND=noninteractive
    
    WORKDIR /usr/src/app
    
    RUN apt-get update \
      && apt-get install -y python-is-python3 git python3-pip sudo wget curl \
      && git clone https://github.com/facebookresearch/fairseq.git \
      && cd fairseq \
      && git checkout af12c9c6407bbcf2bca0b2f1923cf78f3db8857c \
      && pip install pip -U \
      && pip install --no-cache-dir . \
      && pip install --no-cache-dir soundfile \
      && pip install --no-cache-dir torch \
      && pip install --no-cache-dir hydra-core \
      && pip install --no-cache-dir editdistance \
      && pip install --no-cache-dir soundfile \
      && pip install --no-cache-dir omegaconf \
      && pip install --no-cache-dir scikit-learn \
      && pip install --no-cache-dir tensorboardX \
      && python setup.py build_ext --inplace \
      && apt update \
      && apt -y install libsndfile-dev \
      && rm -rf /var/lib/apt/lists/* \
      && wget https://github.com/mikefarah/yq/releases/latest/download/yq_linux_amd64 -O /usr/bin/yq \
      && chmod +x /usr/bin/yq \
      && yq -i '.common.cpu = true' examples/mms/asr/config/infer_common.yaml
    
    ENV USERNAME=user
    RUN echo "root:root" | chpasswd \
      && adduser --disabled-password --gecos "" "${USERNAME}" \
      && echo "${USERNAME}:${USERNAME}" | chpasswd \
      && echo "%${USERNAME}    ALL=(ALL)   NOPASSWD:    ALL" >> /etc/sudoers.d/${USERNAME} \
      && chmod 0440 /etc/sudoers.d/${USERNAME}
    USER ${USERNAME}
    
    WORKDIR /usr/src/app/fairseq
    
    CMD [ "python", "examples/mms/asr/infer/mms_infer.py" ]
  • .dockerignore
    *
  • build
    docker build -t fairseq:dev -f Dockerfile.mms .
  • run
    docker run --rm -it --gpus all \
    -e USER=user \
    -v $(pwd):/mms:ro fairseq:dev \
    python examples/mms/asr/infer/mms_infer.py \
    --model /mms/examples/mms/mms1b_fl102.pt \
    --lang eng \
    --audio /mms/examples/mms/English16000.wav
  • results
    Input: /mms/examples/mms/English16000.wav
    Output: i got a callcall from an insurance company yesterday

I used your Dockerfile but it doesn't read audio files:

soundfile.LibsndfileError: Error opening '/home/morashad/projects/fairseq/sample.wav': System error.
ahmedosman2001 commented 1 year ago

@epk2112 Thanks for the notebook. I followed everything you did but the audio im using gives me this error AssertionError: Sentences lengths should not exceed max_tokens=4000000 Does MMS ASR support long audio files? the audio im using is 31 minutes. Here is the full error

>>> preparing tmp manifest dir ...
>>> loading model & running inference ...
2023-05-23 19:55:14.274612: I tensorflow/core/platform/cpu_feature_guard.cc:182] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: AVX2 FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.
2023-05-23 19:55:16.557788: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Could not find TensorRT
Traceback (most recent call last):
  File "/content/fairseq/examples/speech_recognition/new/infer.py", line 499, in <module>
    cli_main()
  File "/content/fairseq/examples/speech_recognition/new/infer.py", line 495, in cli_main
    hydra_main()  # pylint: disable=no-value-for-parameter
  File "/usr/local/lib/python3.10/dist-packages/hydra/main.py", line 32, in decorated_main
    _run_hydra(
  File "/usr/local/lib/python3.10/dist-packages/hydra/_internal/utils.py", line 354, in _run_hydra
    run_and_report(
  File "/usr/local/lib/python3.10/dist-packages/hydra/_internal/utils.py", line 201, in run_and_report
    raise ex
  File "/usr/local/lib/python3.10/dist-packages/hydra/_internal/utils.py", line 198, in run_and_report
    return func()
  File "/usr/local/lib/python3.10/dist-packages/hydra/_internal/utils.py", line 355, in <lambda>
    lambda: hydra.multirun(
  File "/usr/local/lib/python3.10/dist-packages/hydra/_internal/hydra.py", line 136, in multirun
    return sweeper.sweep(arguments=task_overrides)
  File "/usr/local/lib/python3.10/dist-packages/hydra/_internal/core_plugins/basic_sweeper.py", line 154, in sweep
    results = self.launcher.launch(batch, initial_job_idx=initial_job_idx)
  File "/usr/local/lib/python3.10/dist-packages/hydra/_internal/core_plugins/basic_launcher.py", line 76, in launch
    ret = run_job(
  File "/usr/local/lib/python3.10/dist-packages/hydra/core/utils.py", line 129, in run_job
    ret.return_value = task_function(task_cfg)
  File "/content/fairseq/examples/speech_recognition/new/infer.py", line 460, in hydra_main
    distributed_utils.call_main(cfg, main)
  File "/content/fairseq/fairseq/distributed/utils.py", line 404, in call_main
    main(cfg, **kwargs)
  File "/content/fairseq/examples/speech_recognition/new/infer.py", line 407, in main
    with InferenceProcessor(cfg) as processor:
  File "/content/fairseq/examples/speech_recognition/new/infer.py", line 148, in __init__
    self.progress_bar = self.build_progress_bar()
  File "/content/fairseq/examples/speech_recognition/new/infer.py", line 264, in build_progress_bar
    iterator=self.get_dataset_itr(),
  File "/content/fairseq/examples/speech_recognition/new/infer.py", line 242, in get_dataset_itr
    return self.task.get_batch_iterator(
  File "/content/fairseq/fairseq/data/iterators.py", line 420, in next_epoch_itr
    self._cur_epoch_itr = self._get_iterator_for_epoch(
  File "/content/fairseq/fairseq/data/iterators.py", line 488, in _get_iterator_for_epoch
    self.epoch_batch_sampler = FrozenBatchSampler(
  File "/content/fairseq/fairseq/data/iterators.py", line 248, in __init__
    self.make_batches_for_epoch(epoch, initial_offset)
  File "/content/fairseq/fairseq/data/iterators.py", line 251, in make_batches_for_epoch
    self.batches = self.ordered_batches(
  File "/content/fairseq/fairseq/data/iterators.py", line 566, in ordered_batches
    batches = self.frozen_batches
  File "/content/fairseq/fairseq/data/iterators.py", line 358, in frozen_batches
    self._frozen_batches = tuple(self.batch_sampler(self.dataset, self.epoch))
  File "/content/fairseq/fairseq/tasks/fairseq_task.py", line 300, in make_batches
    batches = dataset.batch_by_size(
  File "/content/fairseq/fairseq/data/base_wrapper_dataset.py", line 61, in batch_by_size
    return self.dataset.batch_by_size(
  File "/content/fairseq/fairseq/data/fairseq_dataset.py", line 145, in batch_by_size
    return data_utils.batch_by_size(
  File "/content/fairseq/fairseq/data/data_utils.py", line 341, in batch_by_size
    b = batch_by_size_fn(
  File "fairseq/data/data_utils_fast.pyx", line 108, in fairseq.data.data_utils_fast.batch_by_size_fn
    cpdef list batch_by_size_fn(
  File "fairseq/data/data_utils_fast.pyx", line 123, in fairseq.data.data_utils_fast.batch_by_size_fn
    return batch_by_size_vec(indices, num_tokens_vec, max_tokens,
  File "fairseq/data/data_utils_fast.pyx", line 30, in fairseq.data.data_utils_fast.batch_by_size_vec
    assert max_tokens <= 0 or np.max(num_tokens_vec) <= max_tokens, (
AssertionError: Sentences lengths should not exceed max_tokens=4000000
Traceback (most recent call last):
  File "/content/fairseq/examples/mms/asr/infer/mms_infer.py", line 52, in <module>
    process(args)
  File "/content/fairseq/examples/mms/asr/infer/mms_infer.py", line 44, in process
    with open(tmpdir/"hypo.word") as fr:
FileNotFoundError: [Errno 2] No such file or directory: '/content/temp_dir/tmp993hbyip/hypo.word'
leonardltk commented 1 year ago

i run the code based on the docker, but it fails again

>>> preparing tmp manifest dir ...
>>> loading model & running inference ...
Traceback (most recent call last):
  File "examples/speech_recognition/new/infer.py", line 499, in <module>
    cli_main()
  File "examples/speech_recognition/new/infer.py", line 495, in cli_main
    hydra_main()  # pylint: disable=no-value-for-parameter
  File "/usr/local/lib/python3.8/site-packages/hydra/main.py", line 32, in decorated_main
    _run_hydra(
  File "/usr/local/lib/python3.8/site-packages/hydra/_internal/utils.py", line 354, in _run_hydra
    run_and_report(
  File "/usr/local/lib/python3.8/site-packages/hydra/_internal/utils.py", line 201, in run_and_report
    raise ex
  File "/usr/local/lib/python3.8/site-packages/hydra/_internal/utils.py", line 198, in run_and_report
    return func()
  File "/usr/local/lib/python3.8/site-packages/hydra/_internal/utils.py", line 355, in <lambda>
    lambda: hydra.multirun(
  File "/usr/local/lib/python3.8/site-packages/hydra/_internal/hydra.py", line 136, in multirun
    return sweeper.sweep(arguments=task_overrides)
  File "/usr/local/lib/python3.8/site-packages/hydra/_internal/core_plugins/basic_sweeper.py", line 154, in sweep
    results = self.launcher.launch(batch, initial_job_idx=initial_job_idx)
  File "/usr/local/lib/python3.8/site-packages/hydra/_internal/core_plugins/basic_launcher.py", line 76, in launch
    ret = run_job(
  File "/usr/local/lib/python3.8/site-packages/hydra/core/utils.py", line 129, in run_job
    ret.return_value = task_function(task_cfg)
  File "examples/speech_recognition/new/infer.py", line 460, in hydra_main
    distributed_utils.call_main(cfg, main)
  File "/usr/src/app/fairseq/distributed/utils.py", line 404, in call_main
    main(cfg, **kwargs)
  File "examples/speech_recognition/new/infer.py", line 407, in main
    with InferenceProcessor(cfg) as processor:
  File "examples/speech_recognition/new/infer.py", line 132, in __init__
    self.task.load_dataset(
  File "/usr/src/app/fairseq/tasks/audio_finetuning.py", line 140, in load_dataset
    super().load_dataset(split, task_cfg, **kwargs)
  File "/usr/src/app/fairseq/tasks/audio_pretraining.py", line 150, in load_dataset
    if task_cfg.multi_corpus_keys is None:
  File "/usr/local/lib/python3.8/site-packages/omegaconf/dictconfig.py", line 305, in __getattr__
    self._format_and_raise(key=key, value=None, cause=e)
  File "/usr/local/lib/python3.8/site-packages/omegaconf/base.py", line 95, in _format_and_raise
    format_and_raise(
  File "/usr/local/lib/python3.8/site-packages/omegaconf/_utils.py", line 629, in format_and_raise
    _raise(ex, cause)
  File "/usr/local/lib/python3.8/site-packages/omegaconf/_utils.py", line 610, in _raise
    raise ex  # set end OC_CAUSE=1 for full backtrace
  File "/usr/local/lib/python3.8/site-packages/omegaconf/dictconfig.py", line 303, in __getattr__
    return self._get_impl(key=key, default_value=DEFAULT_VALUE_MARKER)
  File "/usr/local/lib/python3.8/site-packages/omegaconf/dictconfig.py", line 361, in _get_impl
    node = self._get_node(key=key)
  File "/usr/local/lib/python3.8/site-packages/omegaconf/dictconfig.py", line 383, in _get_node
    self._validate_get(key)
  File "/usr/local/lib/python3.8/site-packages/omegaconf/dictconfig.py", line 135, in _validate_get
    self._format_and_raise(
  File "/usr/local/lib/python3.8/site-packages/omegaconf/base.py", line 95, in _format_and_raise
    format_and_raise(
  File "/usr/local/lib/python3.8/site-packages/omegaconf/_utils.py", line 694, in format_and_raise
    _raise(ex, cause)
  File "/usr/local/lib/python3.8/site-packages/omegaconf/_utils.py", line 610, in _raise
    raise ex  # set end OC_CAUSE=1 for full backtrace
omegaconf.errors.ConfigAttributeError: Key 'multi_corpus_keys' is not in struct
        full_key: task.multi_corpus_keys
        reference_type=Any
        object_type=dict
Traceback (most recent call last):
  File "examples/mms/asr/infer/mms_infer.py", line 52, in <module>
    process(args)
  File "examples/mms/asr/infer/mms_infer.py", line 44, in process
    with open(tmpdir/"hypo.word") as fr:
FileNotFoundError: [Errno 2] No such file or directory: '/tmp/tmp4o9kxdyr/hypo.word'

same, issue with

omegaconf.errors.ConfigAttributeError: Key 'multi_corpus_keys' is not in struct
  full_key: task.multi_corpus_keys
  reference_type=Any
  object_type=dict

Do we have to edit the yaml file or something ?

luisroque commented 1 year ago

I managed to run it on a Linux machine; thanks for the tips.

A few useful points:

You can find the Python script that I used to run ASR with MMS here with some more details here. I hope it helps!

fcecagno commented 1 year ago

@luisroque did you manage to run it with long audio files?

luisroque commented 1 year ago

I was using small audio files just to test it out, but I can try a bigger one. What would be your definition of long? ~30min works?

fcecagno commented 1 year ago

I was using small audio files just to test it out, but I can try a bigger one. What would be your definition of long? ~30min works?

30 min would be good enough, 1 hour would be even better.

ttv20 commented 1 year ago

i run the code based on the docker, but it fails again

>>> preparing tmp manifest dir ...
>>> loading model & running inference ...
Traceback (most recent call last):
  File "examples/speech_recognition/new/infer.py", line 499, in <module>
    cli_main()
  File "examples/speech_recognition/new/infer.py", line 495, in cli_main
    hydra_main()  # pylint: disable=no-value-for-parameter
  File "/usr/local/lib/python3.8/site-packages/hydra/main.py", line 32, in decorated_main
    _run_hydra(
  File "/usr/local/lib/python3.8/site-packages/hydra/_internal/utils.py", line 354, in _run_hydra
    run_and_report(
  File "/usr/local/lib/python3.8/site-packages/hydra/_internal/utils.py", line 201, in run_and_report
    raise ex
  File "/usr/local/lib/python3.8/site-packages/hydra/_internal/utils.py", line 198, in run_and_report
    return func()
  File "/usr/local/lib/python3.8/site-packages/hydra/_internal/utils.py", line 355, in <lambda>
    lambda: hydra.multirun(
  File "/usr/local/lib/python3.8/site-packages/hydra/_internal/hydra.py", line 136, in multirun
    return sweeper.sweep(arguments=task_overrides)
  File "/usr/local/lib/python3.8/site-packages/hydra/_internal/core_plugins/basic_sweeper.py", line 154, in sweep
    results = self.launcher.launch(batch, initial_job_idx=initial_job_idx)
  File "/usr/local/lib/python3.8/site-packages/hydra/_internal/core_plugins/basic_launcher.py", line 76, in launch
    ret = run_job(
  File "/usr/local/lib/python3.8/site-packages/hydra/core/utils.py", line 129, in run_job
    ret.return_value = task_function(task_cfg)
  File "examples/speech_recognition/new/infer.py", line 460, in hydra_main
    distributed_utils.call_main(cfg, main)
  File "/usr/src/app/fairseq/distributed/utils.py", line 404, in call_main
    main(cfg, **kwargs)
  File "examples/speech_recognition/new/infer.py", line 407, in main
    with InferenceProcessor(cfg) as processor:
  File "examples/speech_recognition/new/infer.py", line 132, in __init__
    self.task.load_dataset(
  File "/usr/src/app/fairseq/tasks/audio_finetuning.py", line 140, in load_dataset
    super().load_dataset(split, task_cfg, **kwargs)
  File "/usr/src/app/fairseq/tasks/audio_pretraining.py", line 150, in load_dataset
    if task_cfg.multi_corpus_keys is None:
  File "/usr/local/lib/python3.8/site-packages/omegaconf/dictconfig.py", line 305, in __getattr__
    self._format_and_raise(key=key, value=None, cause=e)
  File "/usr/local/lib/python3.8/site-packages/omegaconf/base.py", line 95, in _format_and_raise
    format_and_raise(
  File "/usr/local/lib/python3.8/site-packages/omegaconf/_utils.py", line 629, in format_and_raise
    _raise(ex, cause)
  File "/usr/local/lib/python3.8/site-packages/omegaconf/_utils.py", line 610, in _raise
    raise ex  # set end OC_CAUSE=1 for full backtrace
  File "/usr/local/lib/python3.8/site-packages/omegaconf/dictconfig.py", line 303, in __getattr__
    return self._get_impl(key=key, default_value=DEFAULT_VALUE_MARKER)
  File "/usr/local/lib/python3.8/site-packages/omegaconf/dictconfig.py", line 361, in _get_impl
    node = self._get_node(key=key)
  File "/usr/local/lib/python3.8/site-packages/omegaconf/dictconfig.py", line 383, in _get_node
    self._validate_get(key)
  File "/usr/local/lib/python3.8/site-packages/omegaconf/dictconfig.py", line 135, in _validate_get
    self._format_and_raise(
  File "/usr/local/lib/python3.8/site-packages/omegaconf/base.py", line 95, in _format_and_raise
    format_and_raise(
  File "/usr/local/lib/python3.8/site-packages/omegaconf/_utils.py", line 694, in format_and_raise
    _raise(ex, cause)
  File "/usr/local/lib/python3.8/site-packages/omegaconf/_utils.py", line 610, in _raise
    raise ex  # set end OC_CAUSE=1 for full backtrace
omegaconf.errors.ConfigAttributeError: Key 'multi_corpus_keys' is not in struct
        full_key: task.multi_corpus_keys
        reference_type=Any
        object_type=dict
Traceback (most recent call last):
  File "examples/mms/asr/infer/mms_infer.py", line 52, in <module>
    process(args)
  File "examples/mms/asr/infer/mms_infer.py", line 44, in process
    with open(tmpdir/"hypo.word") as fr:
FileNotFoundError: [Errno 2] No such file or directory: '/tmp/tmp4o9kxdyr/hypo.word'

same, issue with

omegaconf.errors.ConfigAttributeError: Key 'multi_corpus_keys' is not in struct
  full_key: task.multi_corpus_keys
  reference_type=Any
  object_type=dict

Do we have to edit the yaml file or something ?

Found the problem, you need to use the model from the finetuned part and not from the pretrained

androstj commented 1 year ago

I was using small audio files just to test it out, but I can try a bigger one. What would be your definition of long? ~30min works?

30 min would be good enough, 1 hour would be even better.

@luisroque @fcecagno To accommodate longer sentence, you can change max_tokens=. The calculation is 16000 sample per seconds 30 minutes 60 seconds per minutes = 28,800,000 max_tokens size. However, I would recommend you to split the long audio into shorter audio, because large max_tokens will cause GPU OOM or run much slower.

MohamedAliRashad commented 1 year ago

What is the latest way to run the model (I tried the docker solution and the colab notebook and both gave me errors) ?

MinSukJoshyOh commented 1 year ago

What is the latest way to run the model (I tried the docker solution and the colab notebook and both gave me errors) ? check out: https://github.com/facebookresearch/fairseq/issues/5122#issuecomment-1559990163

rsgrafx commented 1 year ago

@PINTO0309 Thank you comment earlier.

It worked without any problems. However, it appears that the sampling rate must be 16000.

docker build -t fairseq:dev -f Dockerfile.mms .

* run

docker run --rm -it --gpus all \ -e USER=user \ -v $(pwd):/mms:ro fairseq:dev \ python examples/mms/asr/infer/mms_infer.py \ --model /mms/examples/mms/mms1b_fl102.pt \ --lang eng \ --audio /mms/examples/mms/English16000.wav

* results

Input: /mms/examples/mms/English16000.wav Output: i got a callcall from an insurance company yesterday

Command I ran
docker run --rm -it -e USER=user -v $(pwd):/mms:ro fairseq:dev python examples/mms/asr/infer/mms_infer.py --model mms1b_l1107.pt --lang 'fij' --audio /mms/fijian_audio/morning.wav

Not sure what I missed but running this I ran into an error this error. Maybe its a quick permission issue? Apologies I don't work with Docker regularly.

 ==========
== CUDA ==
==========

CUDA Version 11.8.0

...

WARNING: The NVIDIA Driver was not detected.  GPU functionality will not be available.
   Use the NVIDIA Container Toolkit to start this container with GPU support; see
   https://docs.nvidia.com/datacenter/cloud-native/ .

>>> preparing tmp manifest dir ...
>>> loading model & running inference ...
Traceback (most recent call last):
  File "/usr/lib/python3.8/pathlib.py", line 1288, in mkdir
    self._accessor.mkdir(self, mode)
**FileNotFoundError: [Errno 2] No such file or directory: '/checkpoint/user/INFER/None'**

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/lib/python3.8/pathlib.py", line 1288, in mkdir
    self._accessor.mkdir(self, mode)
**FileNotFoundError: [Errno 2] No such file or directory: '/checkpoint/user/INFER'**

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/lib/python3.8/pathlib.py", line 1288, in mkdir
    self._accessor.mkdir(self, mode)
**FileNotFoundError: [Errno 2] No such file or directory: '/checkpoint/user'**

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "examples/speech_recognition/new/infer.py", line 499, in <module>
    cli_main()
  File "examples/speech_recognition/new/infer.py", line 495, in cli_main
    hydra_main()  # pylint: disable=no-value-for-parameter
  File "/usr/local/lib/python3.8/dist-packages/hydra/main.py", line 32, in decorated_main
    _run_hydra(
  File "/usr/local/lib/python3.8/dist-packages/hydra/_internal/utils.py", line 354, in _run_hydra
    run_and_report(
  File "/usr/local/lib/python3.8/dist-packages/hydra/_internal/utils.py", line 201, in run_and_report
    raise ex
  File "/usr/local/lib/python3.8/dist-packages/hydra/_internal/utils.py", line 198, in run_and_report
    return func()
  File "/usr/local/lib/python3.8/dist-packages/hydra/_internal/utils.py", line 355, in <lambda>
    lambda: hydra.multirun(
  File "/usr/local/lib/python3.8/dist-packages/hydra/_internal/hydra.py", line 136, in multirun
    return sweeper.sweep(arguments=task_overrides)
  File "/usr/local/lib/python3.8/dist-packages/hydra/_internal/core_plugins/basic_sweeper.py", line 140, in sweep
    sweep_dir.mkdir(parents=True, exist_ok=True)
  File "/usr/lib/python3.8/pathlib.py", line 1292, in mkdir
    self.parent.mkdir(parents=True, exist_ok=True)
  File "/usr/lib/python3.8/pathlib.py", line 1292, in mkdir
    self.parent.mkdir(parents=True, exist_ok=True)
  File "/usr/lib/python3.8/pathlib.py", line 1292, in mkdir
    self.parent.mkdir(parents=True, exist_ok=True)
  File "/usr/lib/python3.8/pathlib.py", line 1288, in mkdir
    self._accessor.mkdir(self, mode)
PermissionError: [Errno 13] Permission denied: '/checkpoint'
Traceback (most recent call last):
  File "examples/mms/asr/infer/mms_infer.py", line 52, in <module>
    process(args)
  File "examples/mms/asr/infer/mms_infer.py", line 44, in process
    with open(tmpdir/"hypo.word") as fr:
FileNotFoundError: [Errno 2] No such file or directory: '/tmp/tmp2984k8eq/hypo.word'

Hopefully you spot something . Thank you in advance.

wangsiw1 commented 1 year ago

I am able to inference one audio at a time on cpu, but when I tried to load more audio (tried 3 for now), the job always got killed, seems due to oom?:

preparing tmp manifest dir ... loading model & running inference ... 2023-05-25 05:44:08 | INFO | fairseq.tasks.text_to_speech | Please install tensorboardX: pip install tensorboardX [2023-05-25 05:44:17,222][HYDRA] Launching 1 jobs locally [2023-05-25 05:44:17,222][HYDRA] #0 : decoding.type=viterbi dataset.max_tokens=-1 distributed_training.distributed_world_size=1 common_eval.path='models/meta_mms_20230523/mms1b_all.pt' task.data=/tmp/tmp1wrkle0r dataset.gen_subset=deu:dev common_eval.post_process=letter decoding.results_path=/tmp/tmp1wrkle0r [2023-05-25 05:44:18,329][main][INFO] - models/meta_mms_20230523/mms1b_all.pt [2023-05-25 05:47:28,789][main][INFO] - >>> LOADING ADAPTER: deu [2023-05-25 05:47:28,813][fairseq.data.audio.raw_audio_dataset][INFO] - loaded 3, skipped 0 samples [2023-05-25 05:47:34,898][fairseq.tasks.fairseq_task][INFO] - can_reuse_epoch_itr = True [2023-05-25 05:47:34,903][fairseq.tasks.fairseq_task][INFO] - reuse_dataloader = True [2023-05-25 05:47:34,903][fairseq.tasks.fairseq_task][INFO] - rebuild_batches = True [2023-05-25 05:47:34,904][fairseq.tasks.fairseq_task][INFO] - batches will be rebuilt for each epoch [2023-05-25 05:47:34,906][fairseq.tasks.fairseq_task][INFO] - creating new batches for epoch 1 0%| | 0/1 [00:00<?, ?it/s] /bin/sh: line 1: 8958 Killed PYTHONPATH=. PREFIX=INFER HYDRA_FULL_ERROR=1 python examples/speech_recognition/new/infer.py -m --config-dir examples/mms/asr/config/ --config-name infer_common decoding.type=viterbi dataset.max_tokens=-1 distributed_training.distributed_world_size=1 "common_eval.path='models/meta_mms_20230523/mms1b_all.pt'" task.data=/tmp/tmp1wrkle0r dataset.gen_subset="deu:dev" common_eval.post_process=letter decoding.results_path=/tmp/tmp1wrkle0r

dmesg -T| grep -E -i -B100 'killed process'

[Thu May 25 05:49:44 2023] Task in /slurm/uid_21107/job_30606/step_0 killed as a result of limit of /slurm/uid_21107/job_30606/step_0 [Thu May 25 05:49:44 2023] memory: usage 62760276kB, limit 62914560kB, failcnt 0 [Thu May 25 05:49:44 2023] memory+swap: usage 62914560kB, limit 62914560kB, failcnt 4441 [Thu May 25 05:49:44 2023] kmem: usage 0kB, limit 9007199254740988kB, failcnt 0 [Thu May 25 05:49:44 2023] Memory cgroup stats for /slurm/uid_21107/job_30606/step_0: cache:129352KB rss:62630924KB rss_huge:19105792KB mapped_file:129248KB swap:154284KB inactive_anon:1523844KB active_anon:61236432KB inactive_file:0KB active_file:0KB unevictable:0KB [Thu May 25 05:49:44 2023] [ pid ] uid tgid total_vm rss nr_ptes swapents oom_score_adj name [Thu May 25 05:49:44 2023] [24807] 21107 24807 29248 457 14 61 0 bash [Thu May 25 05:49:44 2023] [ 8953] 21107 8953 36191 1697 25 0 0 python [Thu May 25 05:49:44 2023] [ 8957] 21107 8957 28323 126 11 0 0 sh [Thu May 25 05:49:44 2023] [ 8958] 21107 8958 17193679 15686850 31014 31592 0 python [Thu May 25 05:49:44 2023] Memory cgroup out of memory: Kill process 9628 (python) score 1001 or sacrifice child [Thu May 25 05:49:44 2023] Killed process 8958 (python), UID 21107, total-vm:68774716kB, anon-rss:62617760kB, file-rss:396kB, shmem-rss:129244kB

I tried to set batch_size and data_buffer_size to 1 but did not help. By monitoring the htop, the job was using around 30GB of memory at first, and then the usage suddenly increased to limit. What is the best way to infer many audio files?

Update: It seems one of the audio files is too large(~10 min), inferring only it would also kill the job. Does MMS ASR inference not support long audio?

luisroque commented 1 year ago

What is the latest way to run the model (I tried the docker solution and the colab notebook and both gave me errors) ?

If you are using Linux you can try this. It runs with no problem for me.

androstj commented 1 year ago

I am able to inference one audio at a time on cpu, but when I tried to load more audio (tried 3 for now), the job always got killed, seems due to oom?:

preparing tmp manifest dir ... loading model & running inference ... 2023-05-25 05:44:08 | INFO | fairseq.tasks.text_to_speech | Please install tensorboardX: pip install tensorboardX [2023-05-25 05:44:17,222][HYDRA] Launching 1 jobs locally [2023-05-25 05:44:17,222][HYDRA] #0 : decoding.type=viterbi dataset.max_tokens=-1 distributed_training.distributed_world_size=1 common_eval.path='models/meta_mms_20230523/mms1b_all.pt' task.data=/tmp/tmp1wrkle0r dataset.gen_subset=deu:dev common_eval.post_process=letter decoding.results_path=/tmp/tmp1wrkle0r [2023-05-25 05:44:18,329][main][INFO] - models/meta_mms_20230523/mms1b_all.pt [2023-05-25 05:47:28,789][main][INFO] - >>> LOADING ADAPTER: deu [2023-05-25 05:47:28,813][fairseq.data.audio.raw_audio_dataset][INFO] - loaded 3, skipped 0 samples [2023-05-25 05:47:34,898][fairseq.tasks.fairseq_task][INFO] - can_reuse_epoch_itr = True [2023-05-25 05:47:34,903][fairseq.tasks.fairseq_task][INFO] - reuse_dataloader = True [2023-05-25 05:47:34,903][fairseq.tasks.fairseq_task][INFO] - rebuild_batches = True [2023-05-25 05:47:34,904][fairseq.tasks.fairseq_task][INFO] - batches will be rebuilt for each epoch [2023-05-25 05:47:34,906][fairseq.tasks.fairseq_task][INFO] - creating new batches for epoch 1 0%| | 0/1 [00:00<?, ?it/s] /bin/sh: line 1: 8958 Killed PYTHONPATH=. PREFIX=INFER HYDRA_FULL_ERROR=1 python examples/speech_recognition/new/infer.py -m --config-dir examples/mms/asr/config/ --config-name infer_common decoding.type=viterbi dataset.max_tokens=-1 distributed_training.distributed_world_size=1 "common_eval.path='models/meta_mms_20230523/mms1b_all.pt'" task.data=/tmp/tmp1wrkle0r dataset.gen_subset="deu:dev" common_eval.post_process=letter decoding.results_path=/tmp/tmp1wrkle0r

dmesg -T| grep -E -i -B100 'killed process'

[Thu May 25 05:49:44 2023] Task in /slurm/uid_21107/job_30606/step_0 killed as a result of limit of /slurm/uid_21107/job_30606/step_0 [Thu May 25 05:49:44 2023] memory: usage 62760276kB, limit 62914560kB, failcnt 0 [Thu May 25 05:49:44 2023] memory+swap: usage 62914560kB, limit 62914560kB, failcnt 4441 [Thu May 25 05:49:44 2023] kmem: usage 0kB, limit 9007199254740988kB, failcnt 0 [Thu May 25 05:49:44 2023] Memory cgroup stats for /slurm/uid_21107/job_30606/step_0: cache:129352KB rss:62630924KB rss_huge:19105792KB mapped_file:129248KB swap:154284KB inactive_anon:1523844KB active_anon:61236432KB inactive_file:0KB active_file:0KB unevictable:0KB [Thu May 25 05:49:44 2023] [ pid ] uid tgid total_vm rss nr_ptes swapents oom_score_adj name [Thu May 25 05:49:44 2023] [24807] 21107 24807 29248 457 14 61 0 bash [Thu May 25 05:49:44 2023] [ 8953] 21107 8953 36191 1697 25 0 0 python [Thu May 25 05:49:44 2023] [ 8957] 21107 8957 28323 126 11 0 0 sh [Thu May 25 05:49:44 2023] [ 8958] 21107 8958 17193679 15686850 31014 31592 0 python [Thu May 25 05:49:44 2023] Memory cgroup out of memory: Kill process 9628 (python) score 1001 or sacrifice child [Thu May 25 05:49:44 2023] Killed process 8958 (python), UID 21107, total-vm:68774716kB, anon-rss:62617760kB, file-rss:396kB, shmem-rss:129244kB

I tried to set batch_size and data_buffer_size to 1 but did not help. By monitoring the htop, the job was using around 30GB of memory at first, and then the usage suddenly increased to limit. What is the best way to infer many audio files?

Update: It seems one of the audio files is too large(~10 min), inferring only it would also kill the job. Does MMS ASR inference not support long audio?

It depends on how large your GPU size is, but try to split your audio, if possible, to avoid the OOM. Check this comment: https://github.com/facebookresearch/fairseq/issues/5117#issuecomment-1561787185

ajahstudio commented 1 year ago

After series of tries, i was able to get it to infer on Linux but it could probably work on Windows also. The hypo.word file missing error is due to exceptions thrown during subprocess.run(cmd, shell=True, stdout=subprocess.DEVNULL,) so first i suggest you replace that line with the following:

out = subprocess.run(cmd, check=True, shell=True, stdout=subprocess.DEVNULL, )
print(out)

This will enable you see whats causing the error. Also provide the full paths of your model and audio files like this python examples/mms/asr/infer/mms_infer.py --model "/home/hunter/Downloads/mms1b_all.pt" --lang eng --audio "/home/hunter/Downloads/audio.wav"