Mikolaj / horde-ad

Higher Order Reverse Derivatives Efficiently - Automatic Differentiation library based on the paper "Provably correct, asymptotically efficient, higher-order reverse-mode automatic differentiation"
BSD 3-Clause "New" or "Revised" License
34 stars 6 forks source link

Code an example that recognizes a particular person's speech #58

Open Mikolaj opened 2 years ago

Mikolaj commented 2 years ago

A possible formulation: given a few minutes of a person X speech, the network should be able to quickly determine if any short audio recording contains speech by X or not. Listening to the sample of X speech is permitted to take a lot of resources (real training), but the subsequent classification of many audio fragments should be cheap (no training any more).

The network could initially train on speech samples of many persons that don't include X speech, unless it doesn't make it more accurate nor faster in learning a new speech pattern nor faster in classifying. If that initial step is beneficial, the important question is what data we need for training and if unlabelled data suffices, how we process the data, where we get labelled data for testing.

To determine: loss function, how to get data for training (in addition to the X speech), the architecture of the neural network.

blackhole64 commented 1 year ago

Where do we find some decent training data sets to choose from? Or is this part of the problem too?

Mikolaj commented 1 year ago

I've had some from a potential future client, but he's got busy and I've got busy. In any case, I'm not able to maintain and support the use of the old API of horde-ad at this point and the new one is not yet ready. Apologies.