Code an example with audio recordings of speech, e.g., sentiment analysis

Mikolaj / horde-ad

Higher Order Reverse Derivatives Efficiently - Automatic Differentiation library based on the paper "Provably correct, asymptotically efficient, higher-order reverse-mode automatic differentiation"

BSD 3-Clause "New" or "Revised" License

32 stars 6 forks source link

Code an example with audio recordings of speech, e.g., sentiment analysis #45

Open Mikolaj opened 2 years ago

Mikolaj commented 2 years ago

Audio data is much smaller than video and probably also than photos, so there's a chance this fits CPU processing. Being ~linear~[edit: sequential], this may be a good fit for RNN [edit: recurrent neural networks], which is particularly well supported by our library and probably does not benefit from C libraries beyond lapack/blas. Given that, we may try to reach close to state of the art here, filling any gaps in our library that become apparent.

sfindeisen commented 2 years ago

R = Recurrent?

What does "linear" mean in this context?

tomjaguarpaw commented 2 years ago

Looks like it means "sequential".

sfindeisen commented 2 years ago

Is this about speech to text and then sentiment analysis, i.e. 2 issues in one? Or sentiment analysis directly on audio data?

Mikolaj commented 2 years ago

We may or may not have a future client interested in analysing audio data directly. That's too far into the future to get any details and to worry about, but it can already be a vague inspiration for examples. Having a transcript of the audio may help solving the tasks, but speech-to-text is solved well enough already, I think, so we may assume we already have the transcript. Therefore I'm clumsily making up tasks that don't go through text. Sentiment analysis through intonation is such an attempt. Sentiment analysis through text and audio would probably depend on the text too much and would probably be too complex for the first task. We can probably find something better than sentiment analysis to work on. The key is that public data is available, it's not too large and there are papers to implement and compare our results to.

sfindeisen commented 2 years ago

speech-to-text is solved well enough already

Do we have (or need) an example for this one? Is it hard? I would be interested in German.

Mikolaj commented 2 years ago

We don't have one and we'd like one. I'd imagine this is easy once you have terabytes of data. I don't know if it's possible and reasonably easy with as little data as you are comfortable including in a repo that gets checked out thousands of times per month (e.g., by the CI runner, but also other tools, bots, users). An option is to include the data in the https://github.com/Mikolaj/mostly-harmless repo that is intended to fly under the radar. Still, github limits may prove too restrictive, but perhaps it's worth trying. In the worst case, data needs to downloaded from somewhere each time somebody runs the test, which prevents including it in CI, so it bitrots.

Mikolaj commented 2 years ago

A simper task for a start, but potentially useful on its own: #58.