Initial function for music/speech detection

lstolcman commented 3 years ago

opened by mistake in wrong repo :x

mattgwwalker commented 3 years ago

Istolcman, those changes looked interesting to me. If you feel like contributing you work, please don't hesitate.

I'm currently working on the Opus-encoding parts of the PyOgg library. I have been extremely disappointed to discover that, although the current code works acceptably on macOS and Linux, the same cannot be said for Windows: there is a very significant memory issue that crashes the Python interpreter under Windows. I am currently making heavy changes to OpusBufferedEncoder in the hope to eliminate these issues.

Cheers,

Matthew

lstolcman commented 3 years ago

Hi

Let me introduce origin of the change: the problem I want to solve is getting probability of speech/not speech in wav files. I thought I can use classificator which is used in opus codec to discriminate between speech and music and said to have good results (https://jmvalin.ca/opus/opus-1.3/)

I made some changes to PyOgg and opus so I can output <timestamp>;<probability of speech/music> and save it as .csv. I found insufficient accuracy and currently going forward with specific neural networks to achieve better results.

Finally, as mentioned, my changes consists both pyogg and opus codec. Its unlikely that the gets merged to opus codec repo, so there's no point to include them here either (as said, PR opened here by mistake, I wanted to open it in my cloned repo, to track changes easily)

PS: If you would like to test the code - you need to compile opus from my branch (https://github.com/lstolcman/opus/tree/music_speech_discriminator) (easily in linux: ./autogen.sh ; ./configure ; make). Secondly, clone my branch of PyOgg (https://github.com/lstolcman/PyOgg/tree/music_speech_discriminator) - it may need some changes regarding compiled library path (by default it takes opus lib from the system and it is needed to get custom compiled with additional function to get only the probabilities)

Test file to convert wav->probabilities is 02-encode-opus.py in main directory - will output probability of each (20ms) frame to stdout

TeamPyOgg / PyOgg

Initial function for music/speech detection #57