flashlight / wav2letter

Facebook AI Research's Automatic Speech Recognition Toolkit
https://github.com/facebookresearch/wav2letter/wiki
Other
6.37k stars 1.01k forks source link

PSA: Simple C API #326

Closed lunixbochs closed 3 years ago

lunixbochs commented 5 years ago

I'm working on a simple C API in the talonvoice/w2lapi branch:

https://github.com/talonvoice/wav2letter/blob/w2lapi/w2l.h

The API is bare minimum right now and subject to change.

The main downside right now is the global usage of gflags' FLAGS_* means you can't instantiate two incompatible models. I think it would be good to parameterize any classes that use FLAGS internally to have their config injected instead.

It's only working for emissions right now, as my decode frontend is segfaulting right now when building the Trie and I haven't debugged it yet.

I have a working realtime frontend on top of this which I'll be publishing a Mac demo app for soon.

vineelpratap commented 5 years ago

global usage of gflags' FLAGS_* means you can't instantiate two incompatible models

We plan to remove gflags/glog usage from all the util functions (src/common/*). Would that help ?

lunixbochs commented 5 years ago

Yeah, that would help. I think Featurize is the other main piece I need FLAGS removed for, since I'm not using W2lDataset, seq2seq, or training (the optimizer uses flags) with this frontend.

lunixbochs commented 4 years ago

The API has progressed, and includes a faster decoder that can also handle command DFAs mixed with arbitrary speech:

https://github.com/talonvoice/wav2letter/blob/decoder/w2l.h