bytedance / neurst

Neural end-to-end Speech Translation Toolkit
Other
298 stars 45 forks source link

how can i use NEURST model speech to text with my own mp3 or wav file #53

Open ahmadjameel7171 opened 2 years ago

ahmadjameel7171 commented 2 years ago

Hi Respected sir, can you please guide me on how can I use this repo speech to text with my own mp3 or wav input file

zhaocq-nlp commented 2 years ago

Hi, you can inherit RawAudioDataset and implement load_transcripts and build_iterator functions which iterates on your directory of audio files. Take the AugmentedLibrispeech as an example. And then you can follow this recipt to do data preparation and model training.

Hi Respected sir, can you please guide me on how can I use this repo speech to text with my own mp3 or wav input file

ahmadjameel7171 commented 2 years ago

sir i clearly understand but i want to run pre-train model with my own audio file. i Means i want to test this model. I don’t want to train or and validate this model.

On Mon, 9 May 2022 at 3:37 PM, ZhaoChengqi @.***> wrote:

Hi, you can inherit RawAudioDataset https://github.com/bytedance/neurst/blob/master/neurst/data/datasets/audio/audio_dataset.py#L42 and implement load_transcripts and build_iterator functions which iterates on your directory of audio files. Take the AugmentedLibrispeech https://github.com/bytedance/neurst/blob/master/neurst/data/datasets/audio/aug_librispeech.py as an example. And then you can follow this recipt https://github.com/bytedance/neurst/tree/master/examples/speech_transformer/augmented_librispeech#data-preprocessing to do data preparation and model training.

Hi Respected sir, can you please guide me on how can I use this repo speech to text with my own mp3 or wav input file

— Reply to this email directly, view it on GitHub https://github.com/bytedance/neurst/issues/53#issuecomment-1120933795, or unsubscribe https://github.com/notifications/unsubscribe-auth/ASH2SYW3PPA67QIUV65EANTVJDTHHANCNFSM5VNWTU4A . You are receiving this because you authored the thread.Message ID: @.***>

zhaocq-nlp commented 2 years ago

sir i clearly understand but i want to run pre-train model with my own audio file. i Means i want to test this model. I don’t want to train or and validate this model. On Mon, 9 May 2022 at 3:37 PM, ZhaoChengqi @.> wrote: Hi, you can inherit RawAudioDataset https://github.com/bytedance/neurst/blob/master/neurst/data/datasets/audio/audio_dataset.py#L42 and implement load_transcripts and build_iterator functions which iterates on your directory of audio files. Take the AugmentedLibrispeech https://github.com/bytedance/neurst/blob/master/neurst/data/datasets/audio/aug_librispeech.py as an example. And then you can follow this recipt https://github.com/bytedance/neurst/tree/master/examples/speech_transformer/augmented_librispeech#data-preprocessing to do data preparation and model training. Hi Respected sir, can you please guide me on how can I use this repo speech to text with my own mp3 or wav input file — Reply to this email directly, view it on GitHub <#53 (comment)>, or unsubscribe https://github.com/notifications/unsubscribe-auth/ASH2SYW3PPA67QIUV65EANTVJDTHHANCNFSM5VNWTU4A . You are receiving this because you authored the thread.Message ID: @.>

Hi, I just wrote a tiny script to meet you needs. You need to first convert you audio files to wavform with sample rate 16,000 and run

python3 st_generation.py --model_dir checkpoint_dir --test_file xxx.wav

The st_generation.py:

import argparse

import numpy
import tensorflow as tf
from absl import logging
from scipy.io import wavfile

from neurst.data.audio.log_mel_fbank import LogMelFbank
from neurst.exps.sequence_generator import SequenceGenerator
from neurst.layers.search import build_search_layer
from neurst.tasks import build_task
from neurst.utils import compat
from neurst.utils.checkpoints import restore_checkpoint_if_possible
from neurst.utils.configurable import ModelConfigs

if __name__ == "__main__":
    logging.set_verbosity(logging.INFO)
    parser = argparse.ArgumentParser()
    parser.add_argument("--model_dir", type=str, required=True,
                        help="The checkpoint path")
    parser.add_argument("--test_file", type=str, required=True,
                        help="The input wav file with sample rate=16000")
    parser.add_argument("--dtype", type=str, default="float32", choices=["float32", "float16"])

    args = parser.parse_args()
    search_method = "beam_search"
    beam_size = 4
    maximum_decode_length = 180

    if args.dtype == "float16":
        tf.keras.mixed_precision.set_global_policy("mixed_float16")
        compat.register_computation_dtype("float16", -6.e4)

    cfgs = ModelConfigs.load(args.model_dir)
    task = build_task(cfgs)
    model = task.build_model(cfgs)
    postprocess_fn = task.get_data_postprocess_fn(compat.DataStatus.PROJECTED)
    fbank_fe = LogMelFbank({"nfilt": 80, "winlen": 0.025, "winstep": 0.01})
    restore_checkpoint_if_possible(model, args.model_dir)
    search_layer = build_search_layer({
        "class": search_method,
        "params": {
            "beam_size": beam_size,
            "maximum_decode_length": maximum_decode_length
        }
    })
    generation_model = SequenceGenerator.build_generation_model(task, model, search_layer)
    _ = generation_model.make_predict_function()

    # process input here
    rate, sig = wavfile.read(args.test_file)
    assert rate == 16000
    audio_feature = fbank_fe(sig, rate)
    inp = {"audio": tf.convert_to_tensor([numpy.reshape(audio_feature, -1)],  # TODO filling zeros when batching
                                         dtype=tf.float32),
           "audio_length": tf.convert_to_tensor([len(audio_feature)],
                                                dtype=tf.int64)
           }
    predictions = generation_model.predict_on_batch(inp)
    print([postprocess_fn(x) for x in predictions])
ahmadjameel7171 commented 2 years ago

Respected sir Thank you so much for this favor. I run this and found this error FileNotFoundError: Fail to find model config file: model_configs.yml I try my best to find the pre-train model inside your given repo but can't please provide the Model link.

zhaocq-nlp commented 2 years ago

Respected sir Thank you so much for this favor. I run this and found this error FileNotFoundError: Fail to find model config file: model_configs.yml I try my best to find the pre-train model inside your given repo but can't please provide the Model link.

Could you paste the link of the pre-train model here? And I will check about it.

ahmadjameel7171 commented 2 years ago

Sir i don’t have pre-train model. I am requesting you for pretrain model. please provide link or model file

On Wed, 11 May 2022 at 3:31 AM, ZhaoChengqi @.***> wrote:

Respected sir Thank you so much for this favor. I run this and found this error FileNotFoundError: Fail to find model config file: model_configs.yml I try my best to find the pre-train model inside your given repo but can't please provide the Model link.

Could you paste the link of the pre-train model here? And I will check about it.

— Reply to this email directly, view it on GitHub https://github.com/bytedance/neurst/issues/53#issuecomment-1122965190, or unsubscribe https://github.com/notifications/unsubscribe-auth/ASH2SYWNO3CVFXFNYL2JNQ3VJLPVXANCNFSM5VNWTU4A . You are receiving this because you authored the thread.Message ID: @.***>

zhaocq-nlp commented 2 years ago

Sir i don’t have pre-train model. I am requesting you for pretrain model. please provide link or model file On Wed, 11 May 2022 at 3:31 AM, ZhaoChengqi @.> wrote: Respected sir Thank you so much for this favor. I run this and found this error FileNotFoundError: Fail to find model config file: model_configs.yml I try my best to find the pre-train model inside your given repo but can't please provide the Model link. Could you paste the link of the pre-train model here? And I will check about it. — Reply to this email directly, view it on GitHub <#53 (comment)>, or unsubscribe https://github.com/notifications/unsubscribe-auth/ASH2SYWNO3CVFXFNYL2JNQ3VJLPVXANCNFSM5VNWTU4A . You are receiving this because you authored the thread.Message ID: @.>

Well, which language would you like to have? en-zh, en-de, or ... And are models listed here useful to you?