Open yangyi0818 opened 2 years ago
Hi @yangyi0818, Thank you for reporting the issue! About the first point, I would like to know the following information:
And about the second point, I would like to know the following information:
The latest Conformer-related issue is not yet fixed, and I'm trying to solve it!
Hi @Masao-Someki ! Thank you for your kind reply! Here are my answers.
About the first point:
What is your device? CPU or GPU? CPU
Am I right that your model was constructed with Conformer encoder and Transformer decoder? Yes.
Did you use LM for the inference? Yes. It is a transformer structured LM.
There are two Conformer blocks in ESPnet, the legacy and the latest versions. Which block did you use? Our AM was trained last year, maybe it is a legacy one?
I see quantization is applied to your model. Did you execute your quantized model on GPU? It is true that I set 'quantize=True' in 'export.py'. But I have only tried the unquantized model on CPU.
About the second point: Yes , I checked the weights and I also tried different configurations. It seems that it didn't help much. Here are the results: weights: {ctc: 0.3, decoder: 0.7, length_bonus: 0.0, lm: 0.3} # cer=10.8% (This is the same configuration as inferencing on torch) weights: {ctc: 0.3, decoder: 0.7, length_bonus: 0.0, lm: 1.0} # cer=10.8% weights: {ctc: 0.3, decoder: 1.0, length_bonus: 0.0, lm: 0.1} # cer=11.6% weights: {ctc: 0.5, decoder: 0.5, length_bonus: 0.0, lm: 1.0} # cer=10.7%
Thank you! About the RTF, it may be a problem with the frontend process. If you are using the default frontend, which contains stft and logmel, is it possible to check the performance difference between the torch frontend and the onnx frontend? I recently found a little speed down in espnet_onnx's frontend compared to the ESPnet version. Now I'm considering converting this whole process into onnx. If the frontend causes this problem, I think I have to do this quickly..
Hi, thanks for you share of the espnet_onnx system!
I met two problems when I tried to inference thorough your codes. My acoustic model is trained by myself on our own dataset. The AM architecture is the typical Conformer. I downloaded this code on June.
First, the decoding speed is too slow by it. When using torch to decode, the RTF is around 2.32; however it becomes around 20 when using the transformed onnx.
Second, the CER calculated in the torch version is 7.8% while for the onnx, it becomes 10.6%. I think it is probably wrong.
I'm giving some configs here:
export.py
import sys sys.path.append('espnet-master') sys.path.append('espnet-master/espnet_tts_frontend-master') sys.path.append('espnet_onnx-master/espnet_onnx/export/asr') import torch from export_asr import ModelExport from espnet2.bin.asr_inference import Speech2Text if __name__ == '__main__': m = ModelExport(cache_dir = sys.argv[5]) # export from trained model speech2text=Speech2Text( asr_train_config = sys.argv[1], asr_model_file=sys.argv[2], lm_train_config=sys.argv[3], lm_file=sys.argv[4], ) m.export(model = speech2text, tag_name = 'speech2text', quantize=True)
And I get an onnx dir structured like:
asr/onnx/speech2text/ config.yaml feats_stats.npz full/ quantize/
The test wav is a filelist, structured as:
bigfar_001_000001 /home/dangfeng/exp_xiandao/for_xiandao/onnx_enh/output_0703/enh/bigfar_001_000001.wav bigfar_001_000002 /home/dangfeng/exp_xiandao/for_xiandao/onnx_enh/output_0703/enh/bigfar_001_000002.wav bigfar_001_000003 /home/dangfeng/exp_xiandao/for_xiandao/onnx_enh/output_0703/enh/bigfar_001_000003.wav bigfar_001_000004 /home/dangfeng/exp_xiandao/for_xiandao/onnx_enh/output_0703/enh/bigfar_001_000004.wav bigfar_001_000005 /home/dangfeng/exp_xiandao/for_xiandao/onnx_enh/output_0703/enh/bigfar_001_000005.wav bigfar_001_000006 /home/dangfeng/exp_xiandao/for_xiandao/onnx_enh/output_0703/enh/bigfar_001_000006.wav ...
The decoding process is:
decode.py
import sys sys.path.append('espnet_onnx-master/espnet_onnx/asr') import time import threading import librosa import os from tqdm import tqdm from asr_model import Speech2Text if __name__ == '__main__': """ step1: load onnx file """ speech2text = Speech2Text(tag_name = 'speech2text', model_dir=sys.argv[3],) """ step2: ASR """ f = open(sys.argv[1]) lines = f.readlines() for line in tqdm(lines): with open(os.path.join(sys.argv[2], 'hyp_flush_1process.trn'),'a') as fout: wav_name = line.split(' ')[0].strip() processing_wav = line.split(' ')[1].strip() start = time.time() y, sr = librosa.load(processing_wav, sr=16000) nbest = speech2text(y) asr_result = nbest[0][0] end = time.time() for j in range (len(asr_result)): fout.write(asr_result[j]) if j != len(asr_result) - 1: fout.write(' ') fout.write('\t') fout.write('(') fout.write(wav_name) fout.write('-') fout.write(wav_name) fout.write(')') fout.write('\n') print('processing: ', processing_wav) print('Result: ', asr_result) print('Time: ', end-start, 's')
Furthermore, I noticed that you have mentioned there may be some problems for Conformer AM considering ASR in latest issue, has it been fixed?
Looking forward for your reply!
What is your torch version?
HI @rookie0607 my torch version is 1.7.1 and onnx version is 1.7.0
In relation to the slow speed, can you check how many cores are loaded when you try to inference with onnx as i suspect it could be related? @Masao-Someki I notice that all cpu cores are in use when i try to do cpu inference. Is there a way to avoid this other than setting tasksel 1 ? I tried export OMP_NUM_THREADS=1 but no luck.
@joazoa You can limit the number of threads with the following options:
inter_op_num_threads = 1
intra_op_num_threads = 1
Currently, there is no script to limit the number of threads in espnet_onnx
, so you may need to modify inference codes like this:
import onnxruntime as ort
sess_options = ort.SessionOptions()
sess_options.inter_op_num_threads = 1
sess_options.intra_op_num_threads = 1
self.encoder = onnxruntime.InferenceSession(
self.config.quantized_model_path,
providers=providers,
sess_options=sess_options
)
@Masao-Someki thank you!
Hi, thanks for you share of the espnet_onnx system!
I met two problems when I tried to inference thorough your codes. My acoustic model is trained by myself on our own dataset. The AM architecture is the typical Conformer. I downloaded this code on June.
First, the decoding speed is too slow by it. When using torch to decode, the RTF is around 2.32; however it becomes around 20 when using the transformed onnx.
Second, the CER calculated in the torch version is 7.8% while for the onnx, it becomes 10.6%. I think it is probably wrong.
I'm giving some configs here:
export.py
And I get an onnx dir structured like:
asr/onnx/speech2text/ config.yaml
feats_stats.npz
full/
quantize/
The test wav is a filelist, structured as:
The decoding process is:
decode.py
Furthermore, I noticed that you have mentioned there may be some problems for Conformer AM considering ASR in latest issue, has it been fixed?
Looking forward for your reply!