adrianbg / kaldi.js

Other
61 stars 7 forks source link

Active development #1

Open brescka opened 5 years ago

brescka commented 5 years ago

Was wondering if this project was in active development or if you had plans to release an API at some point? This project (and the demo) look awesome and I'd be really excited to use this technology in a project. Thanks for your time and development!

adrianbg commented 5 years ago

Hey, I wouldn't say it's active unfortunately, though I think there's at least one other person using it. Can you tell me more about what you're trying to do? I can at least help you navigate what's here.

brescka commented 5 years ago

My use case would involve general dictation/transcription. I would love to be able to provide a continuous audio stream and receive transcription.

adrianbg commented 5 years ago

For that I would probably recommend the web speech API. Kaldi might be better for user privacy and cross-browser compatibility, but for dictation the quality would be much, much lower than Chrome's.

On Fri, Dec 14, 2018, 9:36 AM brescka <notifications@github.com wrote:

My use case would involve general dictation/transcription. I would love to be able to provide a continuous audio stream and receive transcription.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/adrianbg/kaldi.js/issues/1#issuecomment-447396967, or mute the thread https://github.com/notifications/unsubscribe-auth/AAEEvx_VREpyYeM1zw-rd9b0RbFhsQFAks5u4-GugaJpZM4ZPxgq .

brescka commented 5 years ago

The accuracy being lower is totally fine in this scenario. Unfortunately, web speech has some limitations (like it's number of requests) that kind of torpedo my project. I'm looking to transcribe audio for length in the magnitude of hours+.

adrianbg commented 5 years ago

Yeah, that makes sense. In that case I would still recommend using a speech API that you pay for unless you're really really against spending money. You can see some benchmarks from my old job https://remeeting.com/benchmarks at Remeeting for conversational telephone speech. I would recommend trying a few APIs on a variety of audio samples (different voices, levels of background noise, ...) and seeing which one gives you the best results.

If your audio has a lot of background noise or difficult speech in it then there may not be much hope of getting decent transcripts even from the APIs unless you spend a lot of time and/or money yourself to customize the model to your use-case. In that case you can still get alternative hypotheses from the transcription APIs that will let you search through your audio much more efficiently than just listening to it all. My last company, Remeeting, makes a product to help do this. I can introduce you to them if you like.

I think Kaldi.js is mainly useful right now for pretty small vocabularies (spotting command words). I haven't tried it on anything other than that Zork demo, which I think has a few hundred words. It seemed to work pretty well, even with high levels of background noise. I'm sure Kaldi.js could be improved in lots of ways, but I don't think I'm likely to do it any time soon. Sorry!

On Thu, Jan 24, 2019 at 8:18 PM brescka notifications@github.com wrote:

The accuracy being lower is totally fine in this scenario. Unfortunately, web speech has some limitations (like it's number of requests) that kind of torpedo my project. I'm looking to transcribe audio for length in the magnitude of hours+.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/adrianbg/kaldi.js/issues/1#issuecomment-457450212, or mute the thread https://github.com/notifications/unsubscribe-auth/AAEEv4rDwxTDDmb7mLSBij8-r2W0drQ3ks5vGoWXgaJpZM4ZPxgq .