How to perform inference?

chenmingxiang110 / Chinese-automatic-speech-recognition

Chinese speech recognition

MIT License

163 stars 22 forks source link

How to perform inference? #12

Closed Aksh97 closed 2 years ago

Aksh97 commented 2 years ago

Hi, I am bit confused on how to perform inference with the help of your pretrained model? Can you please provide the steps?

chenmingxiang110 commented 2 years ago

Take model 903 as an example.

from model903 import model af = AudioFeaturizer() model = model(409)

Then just simply load the pretrained model

sess = tf.Session() saver = tf.train.Saver() saver.restore(sess, "path/to/ckpt")

Now you are ready to go. Read an audio file and you can get their pinyins

rate, data = read_wav("example.wav") data = mergeChannels(data) data = zero_padding_1d(data, 160240) a_seg = AudioSegment(data, rate) xs = np.transpose(np.array([af.featurize(a_seg)]), [0,2,1]) pred = model.predict(sess, xs)[0]

Please read the readme file carefully. It has already got most of the things you need. And by the way you can also check the "subtitle_demo.py" for some code examples.

Aksh97 commented 2 years ago

Hi, Thanks.

I have two queries: This whole code is written on tf 1.x is there any plans to convert it to tf 2.x, because it shows multiple errors because of it.

Also, do you have pretrained model (Then just simply load the pretrained model)?

chenmingxiang110 commented 2 years ago

Sorry Aksh97, the model itself (deep speech) is quite an old algorithm (probably proposed 5 or 6 years ago). So I do not have a plan to re-write this project. If you are interested in audio recognition algorithms written in tf 2.x or torch maybe, you can definitely find some new algorithms such as DFSMN. I haven't followed the latest research for a few years, so maybe this recommendation is also out-of-date. If you are interested in building your own speech recognition algorithm on embedded devices (or PC), you can probably check this project: https://github.com/sipeed/Maix-Speech.

As for the second question, the pretrained models can be downloaded from baidu netdisk: model 903: https://pan.baidu.com/s/1NcTN8gojuIBaIFT9FB3EJw Code: 261u model 902: https://pan.baidu.com/s/1do7C6Egj6sJO7kn1yHPzBg Code: 9o87 model 901: https://pan.baidu.com/s/1utz-1Vv4IO9D-3awj3x1QQ Code: pv08

Aksh97 commented 2 years ago

Thanks for the quick and prompt response. Appreciate it.

Also, thanks for providing the link to Maix-Speech and pretrained models.

Aksh97 commented 2 years ago

Hi, @chenmingxiang110, do you know any other ASR project for Chinese, with good accuracy?

I checked out : https://github.com/sipeed/Maix-Speech. But this is mainly for real time, what I am looking for is that I pass audio file(mp3 or wav ) and it returns text.

Any help will be appreciated.

chenmingxiang110 commented 2 years ago

Sorry I haven't followed the news for years. Probably searching for some latest papers in the field of ASR will lead you to some good open-source projects.

Aksh97 commented 2 years ago

Okay sure. Thanks a lot