Closed vaibhav016 closed 3 years ago
There's some problem for beam search in ContextNet so you can set the beam_width = 0
in decoder_config
to skip using beam search (use greedy only). I'll fix this asap.
@vaibhav016 If you still want to use beam search, I just fix the issue in the main branch (Note: the main branch is now refactored for version 1.x, it's different from other version 0.x a bit)
@usimarit Thank you so much. Now the testing script is running smoothly. The results are as follows
G_WER = 86.0183334
G_CER = 57.3228416
But in the ContextNet paper, its written
ContextNet achieves a word error rate (WER) of
2.1%/4.6% without external language model (LM), 1.9%/4.1%
with LM and 2.9%/7.0% with only 10M parameters on the
clean/noisy LibriSpeech test sets.
The above results dont match. Please can you tell why? The error is very huge.
@vaibhav016 Did you train contextnet on the WHOLE librispeech dataset? Your model is not converged.
@usimarit I trained on librispeech train-clean-100 dataset. Then i tested on test-clean. Can you also tell, how to establish baseline numbers which are mentioned in the research paper, which are around 2%. ?
@usimarit how will my model converged? Sorry, i didn't understand.
@vaibhav016 you train on 960h librispeech with around 20 epoch and you will see
@usimarit Okay. Just to reassure. On librispeech site https://www.openslr.org/12 there are 3 separate datasets for training. So should i add 3 paths to the transcripts in the config file, or should i run the training script 3 times, without deleting the tensorboard directory?
@vaibhav016 You should add 3 paths to config file so that the dataset can load and shuffle them while training.
@usimarit Okay thank you
@usimarit Hello. Due to computation resources constraints, I am following the below procedure to train the model. 1) train on different training data(train-100) 2) from the saved model from above, load it and start training on remaining training datasets(300, 560).
Can you please confirm if this method is correct? Since if I gave the path to all 3 datasets, my ram gives OOM(out of memory error).
@vaibhav016 That method may work, but I don't think it's correct because I think it may forgot what it learnt from 100 while training on (300, 560).
If you're facing OOM, please use option cache: False
, the cache will place all your data in your RAM, which causes OOM, if your RAM is larger than the dataset size then you can use cache, otherwise you'll have to sacrifice the speed so that the training can work.
@usimarit That is extremely helpful. I will use this flag. Thank you so much
@usimarit I had the same question for the results dont match.The result are following:
greedy_wer: 0.7531763315200806 greedy_cer: 0.47527796030044556 beamsearch_wer: 0.10898508876562119 beamsearch_cer: 0.04691151902079582
I train on 960h librispeech. http://openslr.org/12
I use the 1.0.0 version, and use notebook for training.
The config is following:
speech_config:
sample_rate: 16000
frame_ms: 25
stride_ms: 10
num_feature_bins: 80
feature_type: log_mel_spectrogram
preemphasis: 0.97
normalize_signal: True
normalize_feature: True
normalize_per_frame: False
decoder_config:
vocabulary: null
target_vocab_size: 1024
max_subword_length: 4
blank_at_zero: True
beam_width: 20
norm_score: True
model_config:
name: contextnet
encoder_alpha: 0.5
encoder_blocks:
# C0
- nlayers: 1
kernel_size: 5
filters: 256
strides: 1
residual: False
activation: silu
# C1-C2
- nlayers: 5
kernel_size: 5
filters: 256
strides: 1
residual: True
activation: silu
- nlayers: 5
kernel_size: 5
filters: 256
strides: 1
residual: True
activation: silu
# C3
- nlayers: 5
kernel_size: 5
filters: 256
strides: 2
residual: True
activation: silu
# C4-C6
- nlayers: 5
kernel_size: 5
filters: 256
strides: 1
residual: True
activation: silu
- nlayers: 5
kernel_size: 5
filters: 256
strides: 1
residual: True
activation: silu
- nlayers: 5
kernel_size: 5
filters: 256
strides: 1
residual: True
activation: silu
# C7
- nlayers: 5
kernel_size: 5
filters: 256
strides: 2
residual: True
activation: silu
# C8 - C10
- nlayers: 5
kernel_size: 5
filters: 256
strides: 1
residual: True
activation: silu
- nlayers: 5
kernel_size: 5
filters: 256
strides: 1
residual: True
activation: silu
- nlayers: 5
kernel_size: 5
filters: 256
strides: 1
residual: True
activation: silu
# C11 - C13
- nlayers: 5
kernel_size: 5
filters: 512
strides: 1
residual: True
activation: silu
- nlayers: 5
kernel_size: 5
filters: 512
strides: 1
residual: True
activation: silu
- nlayers: 5
kernel_size: 5
filters: 512
strides: 1
residual: True
activation: silu
# C14
- nlayers: 5
kernel_size: 5
filters: 512
strides: 2
residual: True
activation: silu
# C15 - C21
- nlayers: 5
kernel_size: 5
filters: 512
strides: 1
residual: True
activation: silu
- nlayers: 5
kernel_size: 5
filters: 512
strides: 1
residual: True
activation: silu
- nlayers: 5
kernel_size: 5
filters: 512
strides: 1
residual: True
activation: silu
- nlayers: 5
kernel_size: 5
filters: 512
strides: 1
residual: True
activation: silu
- nlayers: 5
kernel_size: 5
filters: 512
strides: 1
residual: True
activation: silu
- nlayers: 5
kernel_size: 5
filters: 512
strides: 1
residual: True
activation: silu
- nlayers: 5
kernel_size: 5
filters: 512
strides: 1
residual: True
activation: silu
# C22
- nlayers: 1
kernel_size: 5
filters: 640
strides: 1
residual: False
activation: silu
prediction_embed_dim: 640
prediction_embed_dropout: 0
prediction_num_rnns: 1
prediction_rnn_units: 640
prediction_rnn_type: lstm
prediction_rnn_implementation: 1
prediction_layer_norm: True
prediction_projection_units: 0
joint_dim: 640
joint_activation: tanh
learning_config:
train_dataset_config:
use_tf: True
augmentation_config:
feature_augment:
time_masking:
num_masks: 10
mask_factor: 100
p_upperbound: 0.05
freq_masking:
num_masks: 1
mask_factor: 27
data_paths:
- /home/jovyan/work/librispeech/train_100/LibriSpeech/output.tsv
- /home/jovyan/work/librispeech/train_360/LibriSpeech/output.tsv
- /home/jovyan/work/librispeech/train_500/LibriSpeech/output.tsv
tfrecords_dir: /home/jovyan/work/librispeech/tfrecords
shuffle: True
cache: True
buffer_size: 100
drop_remainder: True
stage: train
eval_dataset_config:
use_tf: True
data_paths:
- /home/jovyan/work/librispeech/dev-clean/LibriSpeech/trans_dev.tsv
- /home/jovyan/work/librispeech/dev-other/LibriSpeech/trans_dev.tsv
tfrecords_dir: /home/jovyan/work/librispeech/tfrecords
shuffle: False
cache: True
buffer_size: 100
drop_remainder: True
stage: eval
test_dataset_config:
use_tf: True
data_paths:
- /home/jovyan/work/librispeech/test-clean/LibriSpeech/trans_test.tsv
tfrecords_dir: /home/jovyan/work/librispeech/tfrecords
shuffle: False
cache: True
buffer_size: 100
drop_remainder: True
stage: test
optimizer_config:
warmup_steps: 40000
beta_1: 0.9
beta_2: 0.98
epsilon: 1e-9
running_config:
batch_size: 4
num_epochs: 20
checkpoint:
filepath: /home/jovyan/work/foa/TensorFlowASR/examples/contextnet/expp/checkpoints/{epoch:02d}.hdf5
save_best_only: True
save_weights_only: True
save_freq: epoch
states_dir: /home/jovyan/work/foa/TensorFlowASR/examples/contextnet/expp/states
tensorboard:
log_dir: /home/jovyan/work/foa/TensorFlowASR/examples/contextnet/expp/tensorboard
histogram_freq: 1
write_graph: True
write_images: True
update_freq: epoch
profile_batch: 2
The training log is following:
Epoch 16/20
23436/23436 [==============================] - 10827s 462ms/step - loss: 10.9422 - val_loss: 3.9108
Epoch 17/20
23436/23436 [==============================] - 10956s 468ms/step - loss: 10.7611 - val_loss: 3.8306
Epoch 18/20
23436/23436 [==============================] - 10814s 461ms/step - loss: 10.5958 - val_loss: 3.8719
Epoch 19/20
23436/23436 [==============================] - 10963s 468ms/step - loss: 10.4283 - val_loss: 3.8973
Epoch 20/20
23436/23436 [==============================] - 11042s 471ms/step - loss: 10.2906 - val_loss: 3.8752
I set the config.yml same as the notebook config and ran the following scripts for test:
python test.py --config config.yml --saved expp/checkpoints/17.hdf5
Should I change the beam_width or some other config for a better result?
Kindly please help.
@usimarit I also faced a similar situation. Can you please tell any possible options for making the results closer to research paper. Is there any optimal beam_width parameter? I notice, that altering it during testing on test-clean, gives different results.
@vaibhav016 @Custljc In the paper they used 4096 subwords (I tested for this subwords and achieved greedy_wer around 12%). Maybe you guys can try it instead of characters.
@usimarit Sorry, i didnt understand, is it present in the config file? Can you elaborate a little?
This is my test result for contextnet small, it's still far from the result in paper especially for greedy decode, any ideas? INFO:tensorflow:greedy_wer: 0.748846709728241 INFO:tensorflow:greedy_cer: 0.47502392530441284 INFO:tensorflow:beamsearch_wer(beam size 5): 0.10953524708747864 INFO:tensorflow:beamsearch_cer: 0.04725393280386925
@BuaaAlban @vaibhav016 I trained contextnet here. Please check out the config and compare it with your own :smile:
I'll close the issue here. Feel free to reopen if the test still fails
Hello I am facing an error while testing the Context-Net model. Here are the exact steps which I followed.
1) Forked repository and downloaded the datasets from librispeech.
https://www.openslr.org/12
Used train-clean-100, dev-clean, dev-other, test-clean.
2) Setup the conda environment with following specs -
3) made the transcripts for the dataset and updated their locations according to my machine.
4) Following is the config.yml file.
speech_config:
decoder_config:
model_config:
learning_config:
train_dataset_config:
eval_dataset_config:
test_dataset_config:
optimizer_config:
running_config:
5) I didn't modify the config file except for the paths of the transcripts.
python train_context.py
.7) It took 5 days to complete the script There were no errors, no warnings. It finished smoothly. Following are logs.
8) Now the model got saved as latest.h5 and I ran the following script
python test_contextnet.py --saved latest.h5
9) Please help me and let me know if I am missing some step or are my configs wrong?
10) Moreover, i tried with a small dataset _(by applying break statement in create_librispeechtranscripts.py file)
11) I am not able to debug as to where the error lies. I have looked all over Github for this error, but couldn't find anything. Tried with different tensorflow versions, even on my MacBook, still, the error persists. Kindly please help.