glefundes / misophonia-bot

🤖 Telegram bot powered by Deep Learning. Automatically assesses the safety of audios and voice messages for people suffering from misophonia.
MIT License
6 stars 0 forks source link

Sibilance #1

Open 2twentyseven opened 3 years ago

2twentyseven commented 3 years ago

Is it possible to integrate a criteria for Sibilance? Sibilance is when people have a very loud lisp sound that's often piercing to the ears to certain people with misophonia.

What kind of training data would you need to train the bot to recognize sibilance? For example if you listen to the kardashians you can hear a very loud lisp from them.

I one day want to be able to work on a project that can analyze the audio for sibilance and somehow smooth it out or entirely replace the sound. I've messed around with lots of different audio settings and equalizers and it has nothing really to do with the frequency of the sounds unfortunately.

I know in audio engineering they use a process of called "de-essing" to minimize this affect, but there doesn't exist any kind of program that would take an audio recording analyze it for sibilance and wash out the lisps or replace them.

I assume a starting place would be to just use audio files to identify lisps and then perhaps just muting them entirely and develop upon that basis.

I've tried to use your bot to see if it would recognize sibilance in various audio recording, but its not able to determine whether the audio is safe or not. Realistically this bot wouldn't really help with my particular issue, but ideally at a higher level concept the program would be fed say a podcast, or video file and analyze it and remove the lisps to something that isn't triggering.

At an even higher level it would ideally be able to do this to youtube videos, but I would suspect it would take downloading the video, removing or morphing the triggering sounds, reencode the video to watch. I don't think this would work for live streaming audio, but you'd click on your podcast or youtube video and experience a short delay as the program encodes the new file to be misophonia friendly.

There really isn't anyone doing work in this space, but I think it would greatly benefit many people who have misophonia.

glefundes commented 3 years ago

The model currently deployed to the bot was trained on a binary dataset of noise/clean data. This data was a collection of audio message scraped from private groups, manually labeled by me using AudioClass.

The criteria I used to classify the audio files was basically "is there noise here that can trigger misophonia?", and the majority of the files labeled as noisy were due to poor mic conditions, wind blowing on the mic, loud background music/noise, and stuff like that. It makes sense that the model can't classify sibilance since it didn't really see any explicit examples of it.

One possibility I can think of is training a new model on more diverse data with multiple classes such as static noise, music, sibilance, etc. It would be interesting to implement this on a future version as a hierarchical classification problem so thanks for the suggestion. Please let me know if you have suggestions on how to obtain this kind of data.

As for your idea of a live de-esser, it falls out of the scope of this project and I'm not aware of anything that does that automatically. My experience in this area is limited to editing a few podcasts and using the de-esser manually on Audacity. It's not a very stable phenomenom, so I think It would be hard to make a generalized de-esser to work live, but it's probably doable with deep learning.

Sorry if I couldn't be of more help. Please keep in touch if you end up developing something like this because I'd be really interested in seeing your solution. Again, thanks for the suggestion and I'll try to implement the multiclass approach to this project when I get some free time from work :)