MTG / essentia

C++ library for audio and music analysis, description and synthesis, including Python bindings
http://essentia.upf.edu
GNU Affero General Public License v3.0
2.83k stars 530 forks source link

Potential use-case: DJing and light control #816

Closed randombyte-developer closed 5 years ago

randombyte-developer commented 5 years ago

Hi!

I hope you're having a great day ;)

I am asking here because of the Create an issue on github if your question was not answered before [in the FAQ].

I am DJing with Mixxx. I would like to control my light setup over DMX with QLC+. The lights should react accordingly to the current music.

Extracting the beat and volume is easy with Mixxx, that information can be sent over MIDI to QLC+, which then advances a cuelist, controls light intensity or something. Mixxx can't really output how much bass/mid/highs is played currently.

Therefore I am thinking of creating something small that analyzes the music on the go in realtime. I am running Mixxx on Linux, I guess it should be no problem to grab the audio from ALSA. I am wondering if Essentia is the right tool for this. Is there something to identify how much is going in a song? And that in realtime?

dbogdanov commented 5 years ago

This is a perfect use-case that can be covered by Essentia. We have a bunch of computationally easy algorithms that can work in real-time for this purpose.

For bass/mid/highs levels, the simplest approach is to compute energy in various frequency bands using either rectangular or triangular frequency bands. Or what else do you mean by "how much is going"?

Related to ALSA, we will provide a new algorithm that wraps RtAudio for streaming audio using Essentia's streaming mode. Those updates will follow soon. There's some work done in the rtaudio branch, but it is not ready yet.

Meanwhile you can read audio using RingBufferInput like it is done in EssentiaRT~.

randombyte-developer commented 5 years ago

Or what else do you mean by "how much is going"? I've seen something about extracting the melody (which seems only possible with larger parts of the track). I'm not interested in melody directly but I thought you guys have other kind of magic things. My goal is to detect intro/transition, refrain, and such. Those parts can be easily identified by the DJ in the waveforms, but I am wondering if that is somehow possible to be done in RT. Since those changes can be seen in the waveforms (which are highlighted with bass/mid/hi colors), I thought looking at the frequencies would help.

dbogdanov commented 5 years ago

Yes, that makes sense. You could probably also look at overall instantaneous loudness levels (see LoudnessEBUR128, RMS, Loudness).

Robust melody extraction is not feasible in real-time as our best algorithm is based on statistics and requires the entire track as an input. Still, you can try using a simpler PitchYinFFT or PitchYin (that are normally suited for monophonic sounds) and see how it works.

randombyte-developer commented 5 years ago

Ok cool, thanks, once I got some time I will definitely have a play with these. I will close this issue because the initial question (is Essentia the right tool) is answered.

zumpchke commented 5 months ago

Robust melody extraction is not feasible in real-time as our best algorithm is based on statistics and requires the entire track as an input. Still, you can try using a simpler PitchYinFFT or PitchYin (that are normally suited for monophonic sounds) and see how it works.

@dbogdanov Is this true? Seems pretty accurate real time here (just me singing).. https://essentiajs-pitchmelodia.netlify.app

dbogdanov commented 5 months ago

@zumpchke Does it work well for you for full mix audio? The original algorithm is not intended to be used on short audio buffers of full-mix polyphonic audio. In this demo we used it for monophonic inputs, but did not do any tests to see if it works for other cases.