at16k / at16k-t2t-helpers

Tensor2Tensor helper utilities for playing with pre-trained at16k models.
3 stars 2 forks source link

Fine-tuning at16k-t2t model for domain-specific entries #5

Open sauravjoshi opened 4 years ago

sauravjoshi commented 4 years ago

I'm relatively new to t2t and was studying leveraging it for ASR when I came across your work. Amazing work done @mohitsshah with proper explanation over at16k. The results are pretty impressive. I'm planning to extend the model for domain-specific approach with an overview of extending the vocab. Would like your assist onto the following.

As I could drill down, the problem registered through class At16kSubword which extends from class asr, is done similarly like class say LibriSpeech which inherits from class SpeechRecognitionProblem.

  1. The class At16kSubword has property of multiprocess_generate set as true this certainly means the data is being generated as several multiple processes. What was your config over this and depending upon the hours of data, what was time-spent?

  2. Also the core generate_data and generator functions aren't defined, what was the data you build it? Did you leveraged the librispeech and added you'rs own data. Those two function definition would be required to keep in sync with the additional data I'll be fine-tuning on. Could you provide that?

  3. The approx_vocab_size is defined as 1000 only? If our goal is to extend the vocab, and we are utilising the existing vocab which is getting used in feature_encoders(), will the new sub-words be added to this or a new vocab with additional sub-words shall be created, because as far as i know the vocab is generated in data-gen phase ?

    How much should the approx_vocab_size be tweaked?

Can you provide the data generation command that you used that would be containing the additional FLAGS. The training command used with the additional FLAGS could be provided?

MoslemTCM commented 4 years ago

I have a similar question for @mohitsshah: I am trying to fine-tune your model using a new dataset for example Librispeech. However, when I try to to generate Librispeech data and continue training with your provided weights, obtained results are completely wrong and doesn't make sense. I am using the following script to continue training the model:

DATA_DIR=/media/disk3/Voice2text/t2t_data/ TMP_DIR=/media/disk3/Voice2text/t2t_datagen/ TRAIN_DIR=/media/disk3/Voice2text/t2t_train/librispeech_english/

PROBLEM=at16k_subword

python /media/disk3/Voice2text/env/lib/python3.6/site-packages/tensor2tensor/bin/t2t_trainer.py \ --t2t_usr_dir=/media/disk3/Voice2text/ \ --data_dir=$DATA_DIR \ --output_dir=$TRAIN_DIR \ --model=transformer \ --worker_gpu_memory_fraction=0.9 \ --hparams_set=transformer_librispeech_tpu \ --hparams=max_length=295650,max_input_seq_length=3650,max_target_seq_length=250 \ --train_steps=7000000 \ --problem=$PROBLEM \ --allow_growth=True

Can you provide the data generation command that you used applied for example to Librispeech dataset and containing your flags?