google-ai-edge / mediapipe

Cross-platform, customizable ML solutions for live and streaming media.
https://ai.google.dev/edge/mediapipe
Apache License 2.0
26.98k stars 5.1k forks source link

Model Maker - Custom Audio Classifier #5162

Open Akz47 opened 7 months ago

Akz47 commented 7 months ago

MediaPipe Solution (you are using)

Audio Classifier

Programming language

Python

Are you willing to contribute it

None

Describe the feature and the current behaviour/state

We would like to train a custom audio classifier using Model Maker, but the module appears to support image / text classifiers only. What is the best way for audio transfer learning using MediaPipe?

Will this change the current API? How?

No response

Who will benefit with this feature?

Users training custom audio classes / sound events

Please specify the use cases for this feature

Detect and classify custom sounds instead of the default Yamnet model ones

Any Other info

No response

kuaashish commented 7 months ago

Hi @Akz47,

Thank you for bringing this matter to our attention. It is currently not feasible to retrain the Audio Classifier using our model maker. We acknowledge this as a feature request and will share it with our team. Regarding any other inquiries you may have, we will assign issue to the appropriate owner for further assistance.

Thank you!!

Akz47 commented 7 months ago

Hi @Akz47,

Thank you for bringing this matter to our attention. It is currently not feasible to retrain the Audio Classifier using our model maker. We acknowledge this as a feature request and will share it with our team. Regarding any other inquiries you may have, we will assign issue to the appropriate owner for further assistance.

Thank you!!

Hi @kuaashish,

Thanks for your reply.

Would it work if we used TFLite's Model Maker to train the custom audio classification model, then import that model into MediaPipe?

Reference: https://www.tensorflow.org/lite/models/modify/model_maker/audio_classification

It would be very helpful if you could please recommend some good, compatible approaches.

kuaashish commented 7 months ago

Hi @joezoug,

Could you please provide any pointers here? Thank you!!

Akz47 commented 7 months ago

Hi @kuaashish,

In MediaPipe's AudioClassifier documentation, the AudioClassifierOptions doesn't seem to allow for the customization of hop duration.

We are trying to classify shorter sound events, and a 1-second hop / window might be inaccurate or overlook these events.

Based on online literature we found, it seems that Yamnet's PATCH_HOP_SECONDS can be customized: https://groups.google.com/g/audioset-users/c/pRDX6AkaM1s

Is there a way to set the PATCH_HOP_SECONDS parameter within your Classifier options, or directly within the source code?

Thank you.