Code an example that recognizes a particular person's speech

Mikolaj commented 2 years ago

A possible formulation: given a few minutes of a person X speech, the network should be able to quickly determine if any short audio recording contains speech by X or not. Listening to the sample of X speech is permitted to take a lot of resources (real training), but the subsequent classification of many audio fragments should be cheap (no training any more).

The network could initially train on speech samples of many persons that don't include X speech, unless it doesn't make it more accurate nor faster in learning a new speech pattern nor faster in classifying. If that initial step is beneficial, the important question is what data we need for training and if unlabelled data suffices, how we process the data, where we get labelled data for testing.

To determine: loss function, how to get data for training (in addition to the X speech), the architecture of the neural network.

blackhole64 commented 1 year ago

Where do we find some decent training data sets to choose from? Or is this part of the problem too?

Mikolaj commented 1 year ago

I've had some from a potential future client, but he's got busy and I've got busy. In any case, I'm not able to maintain and support the use of the old API of horde-ad at this point and the new one is not yet ready. Apologies.

Mikolaj / horde-ad

Code an example that recognizes a particular person's speech #58