drscotthawley / panotti

A multi-channel neural network audio classifier using Keras
MIT License
270 stars 72 forks source link

How do I iterate over an audio file to not miss a part that fits a class? #47

Open ErfolgreichCharismatisch opened 5 years ago

ErfolgreichCharismatisch commented 5 years ago

I have an audio file that contains a part that matches a class I trained, for instance the letter R in a speech.

I would set an arbitrary length, like 20ms. Then I would split the audio file in 20ms intervals, send each to predict_class and take the part where the probability for my class is the highest. Yet with this method I could be exactly at the corner of the wanted area, it could be stretched(longer than the original file) etc..

How do I iterate over the audio file to not miss it?

drscotthawley commented 5 years ago

Hi. That's not a use case that I considered, so there's nothing in the code to support that sort of thing. You'd have to write some more custom code, and I'm not sure what the best approach for that would be.

You might want to pre-process the data with an onset detector / segmentation tool. ('Course if you do that, then it might already do the classification for you.)

ErfolgreichCharismatisch commented 5 years ago

Great idea. I am using https://aubio.org/download for this, they created a tool called aubioonset, which does this fairly precisely. Now I need an "offset" algorithm to stop at silence. But this approach might be the most precise one....

Course if you do that, then it might already do the classification for you

aubio does not do classification in a way that lets you get raw data, unfortunately.