Closed randombyte-developer closed 5 years ago
This is a perfect use-case that can be covered by Essentia. We have a bunch of computationally easy algorithms that can work in real-time for this purpose.
For bass/mid/highs levels, the simplest approach is to compute energy in various frequency bands using either rectangular or triangular frequency bands. Or what else do you mean by "how much is going"?
Related to ALSA, we will provide a new algorithm that wraps RtAudio for streaming audio using Essentia's streaming mode. Those updates will follow soon. There's some work done in the rtaudio branch, but it is not ready yet.
Meanwhile you can read audio using RingBufferInput like it is done in EssentiaRT~.
Or what else do you mean by "how much is going"?
I've seen something about extracting the melody (which seems only possible with larger parts of the track). I'm not interested in melody directly but I thought you guys have other kind of magic things. My goal is to detect intro/transition, refrain, and such. Those parts can be easily identified by the DJ in the waveforms, but I am wondering if that is somehow possible to be done in RT.
Since those changes can be seen in the waveforms (which are highlighted with bass/mid/hi colors), I thought looking at the frequencies would help.
Yes, that makes sense. You could probably also look at overall instantaneous loudness levels (see LoudnessEBUR128, RMS, Loudness).
Robust melody extraction is not feasible in real-time as our best algorithm is based on statistics and requires the entire track as an input. Still, you can try using a simpler PitchYinFFT or PitchYin (that are normally suited for monophonic sounds) and see how it works.
Ok cool, thanks, once I got some time I will definitely have a play with these. I will close this issue because the initial question (is Essentia the right tool) is answered.
Robust melody extraction is not feasible in real-time as our best algorithm is based on statistics and requires the entire track as an input. Still, you can try using a simpler PitchYinFFT or PitchYin (that are normally suited for monophonic sounds) and see how it works.
@dbogdanov Is this true? Seems pretty accurate real time here (just me singing).. https://essentiajs-pitchmelodia.netlify.app
@zumpchke Does it work well for you for full mix audio? The original algorithm is not intended to be used on short audio buffers of full-mix polyphonic audio. In this demo we used it for monophonic inputs, but did not do any tests to see if it works for other cases.
Hi!
I hope you're having a great day ;)
I am asking here because of the
Create an issue on github if your question was not answered before [in the FAQ]
.I am DJing with Mixxx. I would like to control my light setup over DMX with QLC+. The lights should react accordingly to the current music.
Extracting the beat and volume is easy with Mixxx, that information can be sent over MIDI to QLC+, which then advances a cuelist, controls light intensity or something. Mixxx can't really output how much bass/mid/highs is played currently.
Therefore I am thinking of creating something small that analyzes the music on the go in realtime. I am running Mixxx on Linux, I guess it should be no problem to grab the audio from ALSA. I am wondering if Essentia is the right tool for this. Is there something to identify how much is going in a song? And that in realtime?