Please update Readme file for better instructions how to use

FurkanGozukara commented 1 year ago

From 100 billion dollars company this Readme file is simply unacceptable

My questions are

How to install and use speech text for transcribing English audio files with best models and settings?.

How to export transcription as vtt or srt subtile files?

What are the options we have to improve transcription quality?

How to install and use text to speech for English?

Which model is best? What are our options?

Can we train a voice? If so how?

None of these are explained in Readme file

audiolion commented 1 year ago

they made it open source, they could have not done that, they had no obligation to do it, and have no standard that says the readme should be better than it is. id take away the hostile language and ask for improvements nicely, these are people, not a faceless corporation, that are choosing to open source and provide documentation

FurkanGozukara commented 1 year ago

they made it open source, they could have not done that, they had no obligation to do it, and have no standard that says the readme should be better than it is. id take away the hostile language and ask for improvements nicely, these are people, not a faceless corporation, that are choosing to open source and provide documentation

What is the purpose of making open source if not usebale by other people? Compare it with how simpler it is to use with whisper : https://github.com/openai/whisper

audiolion commented 1 year ago

open source does not mean "and we will provide support", it simply means the source code is open for anyone to read. I am sure they would like to make it more usable, but you can get rid of the attitude and be a decent person

epk2112 commented 1 year ago

From 100 billion dollars company this Readme file is simply unacceptable

My questions are

How to install and use speech text for transcribing English audio files with best models and settings?.

How to export transcription as vtt or srt subtile files?

What are the options we have to improve transcription quality?

How to install and use text to speech for English?

Which model is best? What are our options?

Can we train a voice? If so how?

None of these are explained in Readme file

How to Transcribe Audio to text (Google Colab Version)👇

Step 1: Clone the Fairseq Git Repo

import os

!git clone https://github.com/pytorch/fairseq

# Get the current working directory
current_dir = os.getcwd()

# Create the directory paths
audio_samples_dir = os.path.join(current_dir, "audio_samples")
temp_dir = os.path.join(current_dir, "temp_dir")

# Create the directories if they don't exist
os.makedirs(audio_samples_dir, exist_ok=True)
os.makedirs(temp_dir, exist_ok=True)

# Change current working directory
os.chdir('fairseq')

!pwd

Step 2: Install requirements and build

Be patient, takes some minutes

!pip install --editable ./

Step 3: Install Tensor Board

!pip install tensorboardX

Step 4: Download your preferred model

Un-comment to download any. If you're not using Google Colab pro then use a smaller model to avoid memory outrage

# # MMS-1B:FL102 model - 102 Languages - FLEURS Dataset
# !wget -P ./models_new 'https://dl.fbaipublicfiles.com/mms/asr/mms1b_fl102.pt'

# # MMS-1B:L1107 - 1107 Languages - MMS-lab Dataset
# !wget -P ./models_new 'https://dl.fbaipublicfiles.com/mms/asr/mms1b_l1107.pt'

# MMS-1B-all - 1162 Languages - MMS-lab + FLEURS + CV + VP + MLS
!wget -P ./models_new 'https://dl.fbaipublicfiles.com/mms/asr/mms1b_all.pt'

Step 5: Upload your audio(s)

Create a folder on path '/content/audio_samples/' and upload your .wav audio files that you need to transcribe e.g. '/content/audio_samples/small_trim4.wav' Note: You need to make sure that the audio data you are using has a sample rate of 16000 You can easily do this with FFMPEG like the example below that converts .mp3 file to .wav and fixing the audio sample rate

ffmpeg -i .\small_trim4.mp3 -ar 16000 .\wav_formats\small_trim4.wav

Step 6: Run Inference and transcribe your audio(s)

Takes some time for long audios

import os

os.environ["TMPDIR"] = '/content/temp_dir'
os.environ["PYTHONPATH"] = "."
os.environ["PREFIX"] = "INFER"
os.environ["HYDRA_FULL_ERROR"] = "1"
os.environ["USER"] = "micro"

!python examples/mms/asr/infer/mms_infer.py --model "/content/fairseq/models_new/mms1b_all.pt" --lang "swh" --audio "/content/audio_samples/small_trim4.wav"

After this you'll get your preffered transcription I have this Collab Example in my GitHub Repo👉 fairseq_meta_mms_Google_Colab_implementation

FurkanGozukara commented 1 year ago

From 100 billion dollars company this Readme file is simply unacceptable My questions are How to install and use speech text for transcribing English audio files with best models and settings?. How to export transcription as vtt or srt subtile files? What are the options we have to improve transcription quality? How to install and use text to speech for English? Which model is best? What are our options? Can we train a voice? If so how? None of these are explained in Readme file

How to Transcribe Audio to text (Google Colab Version)👇

Step 1: Clone the Fairseq Git Repo
import os

!git clone https://github.com/pytorch/fairseq

# Get the current working directory
current_dir = os.getcwd()

# Create the directory paths
audio_samples_dir = os.path.join(current_dir, "audio_samples")
temp_dir = os.path.join(current_dir, "temp_dir")

# Create the directories if they don't exist
os.makedirs(audio_samples_dir, exist_ok=True)
os.makedirs(temp_dir, exist_ok=True)

# Change current working directory
os.chdir('fairseq')

!pwd
Step 2: Install requirements and build

Be patient, takes some minutes

!pip install --editable ./

Step 3: Install Tensor Board

!pip install tensorboardX

Step 4: Download your preferred model

Un-comment to download any. If you're not using Google Colab pro then use a smaller model to avoid memory outrage
# # MMS-1B:FL102 model - 102 Languages - FLEURS Dataset
# !wget -P ./models_new 'https://dl.fbaipublicfiles.com/mms/asr/mms1b_fl102.pt'

# # MMS-1B:L1107 - 1107 Languages - MMS-lab Dataset
# !wget -P ./models_new 'https://dl.fbaipublicfiles.com/mms/asr/mms1b_l1107.pt'

# MMS-1B-all - 1162 Languages - MMS-lab + FLEURS + CV + VP + MLS
!wget -P ./models_new 'https://dl.fbaipublicfiles.com/mms/asr/mms1b_all.pt'
Step 5: Upload your audio(s)

Create a folder on path '/content/audio_samples/' and upload your .wav audio files that you need to transcribe e.g. '/content/audio_samples/small_trim4.wav' Note: You need to make sure that the audio data you are using has a sample rate of 16000 You can easily do this with FFMPEG like the example below that converts .mp3 file to .wav and fixing the audio sample rate
ffmpeg -i .\small_trim4.mp3 -ar 16000 .\wav_formats\small_trim4.wav
Step 6: Run Inference and transcribe your audio(s)

Takes some time for long audios
import os

os.environ["TMPDIR"] = '/content/temp_dir'
os.environ["PYTHONPATH"] = "."
os.environ["PREFIX"] = "INFER"
os.environ["HYDRA_FULL_ERROR"] = "1"
os.environ["USER"] = "micro"

!python examples/mms/asr/infer/mms_infer.py --model "/content/fairseq/models_new/mms1b_all.pt" --lang "swh" --audio "/content/audio_samples/small_trim4.wav"
After this you'll get your preffered transcription I have this Collab Example in my GitHub Repo👉 fairseq_meta_mms_Google_Colab_implementation

thanks a lot great tutorial

i discovered that for english fairseq very bad when compared to whispe so i will pass it

but i wonder what about text to speech

facebookresearch / fairseq