Create Speech Recognition Dataset

deepjavalibrary / djl

An Engine-Agnostic Deep Learning Framework in Java

https://djl.ai

Apache License 2.0

4.13k stars 655 forks source link

Create Speech Recognition Dataset #1589

Closed zachgk closed 2 years ago

zachgk commented 2 years ago

Description

Speech recognition is a task that converts an audio sequence into the text transcript of the words in the audio. It can be used for transcription of online videos, transcriptions of phone calls, text dictation, and controlling voice devices like Alexa. This issue is to add a first speech recognition dataset to DJL's basicdatasets.

Note that this requires adding some additional support for converting between audio and NDArrays. The references contain some examples from some DJL projects that already have audio conversions.

All of these tasks are useful tasks that DJL users may be interested in training. It also helps expand the DJL API into supporting more audio use cases.

References

Possible Speech Recognition datasets to implement
- Common Voice
- Meta AI's Libri-light
- Other datasets from Hugging Face
Possible Reference Code
- JPAM
- Speech recognition example from AIAS

AKAGIwyf commented 2 years ago

HI, I'm interested in this issue and I want to fix it, can you assign it to me? Thanks!

zachgk commented 2 years ago

Yeah. I made an extra note in the description that this one is a bit more work then the Penn Treebank you did previously. Right now, none of the built-in datasets use audio. So, you will need to add a conversion between audio and NDArrays to implement the dataset. If you look at the references, they do have examples of doing this conversion with DJL so they should be very helpful to you. Let me know if you have any questions or get stuck anywhere

dandansamax commented 2 years ago

@zachgk Hello! @AKAGIwyf and I are working on this issue now. We encountering some problems and need your help.

Since audio datasets usually contains different formats of audio data (wav, flac, mp3, etc), we have to use ffmpeg to transform them into float arrays. In AIAS, they directly import whole javacv module to the project. But we think the javacv module is too big to import into djl.basicdataset. So can we add javacv as a new extension? Is this a duplicate of the original djl.opencv extension?

Thanks!

zachgk commented 2 years ago

It sounds like what you need from javacv isn't an extension, but a dependency. For example, there is nothing stopping users from using javacv with DJL. They are both Java libraries and users can import both.

On the other hand, a javacv extension wouldn't help much. If djl.basicdataset depends on djl.javacv and djl.javacv depends on javacv, then djl.basicdataset (transitively) depends on javacv. This still pulls in the same big dependency into a user's project the same as if it was a direct dependency. In the djl.opencv case, it is really about the automatic integration of djl with opencv through the ImageFactory class.

Instead, what might be better is to not put your dataset in basicdataset. You could create a new djl.audio extension to hold the dataset. Then, users will only need the javacv dependency if they use djl.audio, not if they use djl.basicdataset.