Open ahmadjameel7171 opened 2 years ago
Hi, you can inherit RawAudioDataset and implement load_transcripts
and build_iterator
functions which iterates on your directory of audio files. Take the AugmentedLibrispeech as an example. And then you can follow this recipt to do data preparation and model training.
Hi Respected sir, can you please guide me on how can I use this repo speech to text with my own mp3 or wav input file
sir i clearly understand but i want to run pre-train model with my own audio file. i Means i want to test this model. I don’t want to train or and validate this model.
On Mon, 9 May 2022 at 3:37 PM, ZhaoChengqi @.***> wrote:
Hi, you can inherit RawAudioDataset https://github.com/bytedance/neurst/blob/master/neurst/data/datasets/audio/audio_dataset.py#L42 and implement load_transcripts and build_iterator functions which iterates on your directory of audio files. Take the AugmentedLibrispeech https://github.com/bytedance/neurst/blob/master/neurst/data/datasets/audio/aug_librispeech.py as an example. And then you can follow this recipt https://github.com/bytedance/neurst/tree/master/examples/speech_transformer/augmented_librispeech#data-preprocessing to do data preparation and model training.
Hi Respected sir, can you please guide me on how can I use this repo speech to text with my own mp3 or wav input file
— Reply to this email directly, view it on GitHub https://github.com/bytedance/neurst/issues/53#issuecomment-1120933795, or unsubscribe https://github.com/notifications/unsubscribe-auth/ASH2SYW3PPA67QIUV65EANTVJDTHHANCNFSM5VNWTU4A . You are receiving this because you authored the thread.Message ID: @.***>
sir i clearly understand but i want to run pre-train model with my own audio file. i Means i want to test this model. I don’t want to train or and validate this model. … On Mon, 9 May 2022 at 3:37 PM, ZhaoChengqi @.> wrote: Hi, you can inherit RawAudioDataset https://github.com/bytedance/neurst/blob/master/neurst/data/datasets/audio/audio_dataset.py#L42 and implement load_transcripts and build_iterator functions which iterates on your directory of audio files. Take the AugmentedLibrispeech https://github.com/bytedance/neurst/blob/master/neurst/data/datasets/audio/aug_librispeech.py as an example. And then you can follow this recipt https://github.com/bytedance/neurst/tree/master/examples/speech_transformer/augmented_librispeech#data-preprocessing to do data preparation and model training. Hi Respected sir, can you please guide me on how can I use this repo speech to text with my own mp3 or wav input file — Reply to this email directly, view it on GitHub <#53 (comment)>, or unsubscribe https://github.com/notifications/unsubscribe-auth/ASH2SYW3PPA67QIUV65EANTVJDTHHANCNFSM5VNWTU4A . You are receiving this because you authored the thread.Message ID: @.>
Hi, I just wrote a tiny script to meet you needs. You need to first convert you audio files to wavform with sample rate 16,000 and run
python3 st_generation.py --model_dir checkpoint_dir --test_file xxx.wav
The st_generation.py
:
import argparse
import numpy
import tensorflow as tf
from absl import logging
from scipy.io import wavfile
from neurst.data.audio.log_mel_fbank import LogMelFbank
from neurst.exps.sequence_generator import SequenceGenerator
from neurst.layers.search import build_search_layer
from neurst.tasks import build_task
from neurst.utils import compat
from neurst.utils.checkpoints import restore_checkpoint_if_possible
from neurst.utils.configurable import ModelConfigs
if __name__ == "__main__":
logging.set_verbosity(logging.INFO)
parser = argparse.ArgumentParser()
parser.add_argument("--model_dir", type=str, required=True,
help="The checkpoint path")
parser.add_argument("--test_file", type=str, required=True,
help="The input wav file with sample rate=16000")
parser.add_argument("--dtype", type=str, default="float32", choices=["float32", "float16"])
args = parser.parse_args()
search_method = "beam_search"
beam_size = 4
maximum_decode_length = 180
if args.dtype == "float16":
tf.keras.mixed_precision.set_global_policy("mixed_float16")
compat.register_computation_dtype("float16", -6.e4)
cfgs = ModelConfigs.load(args.model_dir)
task = build_task(cfgs)
model = task.build_model(cfgs)
postprocess_fn = task.get_data_postprocess_fn(compat.DataStatus.PROJECTED)
fbank_fe = LogMelFbank({"nfilt": 80, "winlen": 0.025, "winstep": 0.01})
restore_checkpoint_if_possible(model, args.model_dir)
search_layer = build_search_layer({
"class": search_method,
"params": {
"beam_size": beam_size,
"maximum_decode_length": maximum_decode_length
}
})
generation_model = SequenceGenerator.build_generation_model(task, model, search_layer)
_ = generation_model.make_predict_function()
# process input here
rate, sig = wavfile.read(args.test_file)
assert rate == 16000
audio_feature = fbank_fe(sig, rate)
inp = {"audio": tf.convert_to_tensor([numpy.reshape(audio_feature, -1)], # TODO filling zeros when batching
dtype=tf.float32),
"audio_length": tf.convert_to_tensor([len(audio_feature)],
dtype=tf.int64)
}
predictions = generation_model.predict_on_batch(inp)
print([postprocess_fn(x) for x in predictions])
Respected sir Thank you so much for this favor. I run this and found this error FileNotFoundError: Fail to find model config file: model_configs.yml I try my best to find the pre-train model inside your given repo but can't please provide the Model link.
Respected sir Thank you so much for this favor. I run this and found this error FileNotFoundError: Fail to find model config file: model_configs.yml I try my best to find the pre-train model inside your given repo but can't please provide the Model link.
Could you paste the link of the pre-train model here? And I will check about it.
Sir i don’t have pre-train model. I am requesting you for pretrain model. please provide link or model file
On Wed, 11 May 2022 at 3:31 AM, ZhaoChengqi @.***> wrote:
Respected sir Thank you so much for this favor. I run this and found this error FileNotFoundError: Fail to find model config file: model_configs.yml I try my best to find the pre-train model inside your given repo but can't please provide the Model link.
Could you paste the link of the pre-train model here? And I will check about it.
— Reply to this email directly, view it on GitHub https://github.com/bytedance/neurst/issues/53#issuecomment-1122965190, or unsubscribe https://github.com/notifications/unsubscribe-auth/ASH2SYWNO3CVFXFNYL2JNQ3VJLPVXANCNFSM5VNWTU4A . You are receiving this because you authored the thread.Message ID: @.***>
Sir i don’t have pre-train model. I am requesting you for pretrain model. please provide link or model file … On Wed, 11 May 2022 at 3:31 AM, ZhaoChengqi @.> wrote: Respected sir Thank you so much for this favor. I run this and found this error FileNotFoundError: Fail to find model config file: model_configs.yml I try my best to find the pre-train model inside your given repo but can't please provide the Model link. Could you paste the link of the pre-train model here? And I will check about it. — Reply to this email directly, view it on GitHub <#53 (comment)>, or unsubscribe https://github.com/notifications/unsubscribe-auth/ASH2SYWNO3CVFXFNYL2JNQ3VJLPVXANCNFSM5VNWTU4A . You are receiving this because you authored the thread.Message ID: @.>
Well, which language would you like to have? en-zh, en-de, or ... And are models listed here useful to you?
Hi Respected sir, can you please guide me on how can I use this repo speech to text with my own mp3 or wav input file