Open Swagger-z opened 2 months ago
lm_weight
and might get some optimum performance.val
vs. dev_4k
)sorry i made a mistake, the reported results of val set in offical recipe is
dataset Snt Wrd Corr Sub Del Ins Err S.Err decode_asr_lm_lm_train_lm_en_bpe5000_valid.loss.ave_asr_model_valid.acc.ave/val 39341 946469 98.1 1.3 0.5 0.4 2.3 33.6 it seems a bit confusing since i just reproduced the decoding process using the pretrained model without any change, the WER is much higher.
I see. Thanks for pointing out it. Did you run it with the exact same decoding configuration, especially for the normalization?
yes, that's it. I ran the normalized_text one (normalizded text, bpe 5000 (asr_train_asr_conformer6_n_fft512_hop_length256_raw_en_bpe5000)) with the decoding configuration provided in the official spgispeech recipe.
Can you attachthe results (results.txt generated by score.sh)? I want to check to see whether the normalization and so on are treated correctly.
result.txt sure, i uploaded the results.txt just now, please check it.
Thanks a lot! I could not find a specific pattern in the recognition results, and it may take time to debug it... A possible reason would be that it may lose some compatibility. Could you tell me which version of espnet and pytorch did you use?
of course sir,I use espnet-v.202310 and torch 1.13.1. Actually i have tried to compare the source code of espnet-v.0.9.8 (which i guess the version used for the pretrained model) and my version but find nothing...
I see. This is what I wanted to ask. One of my concerns is the positional embedding part. This is tricky and has caused some confusion in the past.
@pengchengguo, do you have any idea of this degradation?
@Swagger-z, it is too much to ask, but by any chance, can you run it with some older versions of espnet? I'm not sure that we can use espnet-v.0.9.8 anymore (might be possible only for inference).
Understood, I will attempt to replicate the error first.
Problem with decode result on SPGISpeech dataset
Hi, I downloaded the pretrained model from https://zenodo.org/record/4585546 and inference with different configs: config1 (correspond to decode_baseline):
config2 (correspond to decode_baseline_wi_elm) :
the performance gets worse after external language model integrated, and it's much wrose than the reported results in offical recipe, which is:
is there anything wrong?