facebookresearch / fairseq

Facebook AI Research Sequence-to-Sequence Toolkit written in Python.
MIT License
30.34k stars 6.39k forks source link

MMS - unique_wer_file: bool = field( SyntaxError: invalid syntax #5128

Open RuslanSel opened 1 year ago

RuslanSel commented 1 year ago

When I'm run python examples/mms/asr/infer/mms_infer.py --model "/path/to/asr/model" --lang lang_code --audio "/path/to/audio_1.wav" "/path/to/audio_1.wav" I got this error:

preparing tmp manifest dir ... loading model & running inference ... File "examples/speech_recognition/new/infer.py", line 53 unique_wer_file: bool = field( ^ SyntaxError: invalid syntax Traceback (most recent call last): File "examples/mms/asr/infer/mms_infer.py", line 52, in process(args) File "examples/mms/asr/infer/mms_infer.py", line 44, in process with open(tmpdir/"hypo.word") as fr: FileNotFoundError: [Errno 2] No such file or directory: '/tmp/tmpkow8cv7n/hypo.word'

fairseq last verstion, Ubuntu 22.04, python 3.8.10, fairseq installed from source.

Thanks in advance.

hrishioa commented 1 year ago

Same issue!

epk2112 commented 1 year ago

How to Transcribe Audio to text (Google Colab Version)👇

Step 1: Clone the Fairseq Git Repo

import os

!git clone https://github.com/pytorch/fairseq

# Get the current working directory
current_dir = os.getcwd()

# Create the directory paths
audio_samples_dir = os.path.join(current_dir, "audio_samples")
temp_dir = os.path.join(current_dir, "temp_dir")

# Create the directories if they don't exist
os.makedirs(audio_samples_dir, exist_ok=True)
os.makedirs(temp_dir, exist_ok=True)

# Change current working directory


Step 2: Install requirements and build

Be patient, takes some minutes

!pip install --editable ./

Step 3: Install Tensor Board

!pip install tensorboardX

Step 4: Download your preferred model

Un-comment to download any. If you're not using Google Colab pro then use a smaller model to avoid memory outrage

# # MMS-1B:FL102 model - 102 Languages - FLEURS Dataset
# !wget -P ./models_new 'https://dl.fbaipublicfiles.com/mms/asr/mms1b_fl102.pt'

# # MMS-1B:L1107 - 1107 Languages - MMS-lab Dataset
# !wget -P ./models_new 'https://dl.fbaipublicfiles.com/mms/asr/mms1b_l1107.pt'

# MMS-1B-all - 1162 Languages - MMS-lab + FLEURS + CV + VP + MLS
!wget -P ./models_new 'https://dl.fbaipublicfiles.com/mms/asr/mms1b_all.pt'

Step 5: Upload your audio(s)

Create a folder on path '/content/audio_samples/' and upload your .wav audio files that you need to transcribe e.g. '/content/audio_samples/small_trim4.wav' Note: You need to make sure that the audio data you are using has a sample rate of 16000 You can easily do this with FFMPEG like the example below that converts .mp3 file to .wav and fixing the audio sample rate

ffmpeg -i .\small_trim4.mp3 -ar 16000 .\wav_formats\small_trim4.wav

Step 6: Run Inference and transcribe your audio(s)

Takes some time for long audios

import os

os.environ["TMPDIR"] = '/content/temp_dir'
os.environ["PYTHONPATH"] = "."
os.environ["PREFIX"] = "INFER"
os.environ["HYDRA_FULL_ERROR"] = "1"
os.environ["USER"] = "micro"

!python examples/mms/asr/infer/mms_infer.py --model "/content/fairseq/models_new/mms1b_all.pt" --lang "swh" --audio "/content/audio_samples/small_trim4.wav"

After this you'll get your preffered transcription I have this Collab Example in my GitHub Repo👉 fairseq_meta_mms_Google_Colab_implementation

patrickvonplaten commented 1 year ago

BTW, it should now be very simple to use MMS with transformers:
