An opensource harmoniser implementation for LV2 and VST leveraging the DISTRHO Plugin Framework and the WORLD speech analysis, manipulation and synthesis system
Outotune has been inspired by Jacob Collier's harmonizer, which can be seen in action here or here.
It allows you to become a one-man choir by analysing your voice and resynthesizing it at different pitches. It is controlled by MIDI, which basically means you can sing and accompany yourself by playing chords on your keyboard, and get a full choir of yous.
See Technical details for more detailed description and documentation.
Originally, Outotune was meant to be an opensource Autotune/Melodyne implementation, but it evolved into a harmoniser. Outotune comes from "Out o' tune", which is pretty self-explanatory.
g++
, make
, pkg-config
For example on Debian, install libfftw3-dev libgl-dev libjack-dev
.
Building on platforms other than Linux is possible, but not tested.
First ensure that you have git submodules initialised by running
git submodule init
git submodule update
Also make sure you have all the needed dependencies installed (see above).
Then simply cd to the correct directory and run make:
cd plugins/outotune
make
By default, make will build outotune for all the supported plugin formats, this currently means LV2, VST2 and a stand-alone JACK executable.
Other useful commands are:
make install
: installs the plugins to home directorymake run
: runs the JACK standalone executablemake clean
, make clean-all
: cleans the build products, the latter also cleans in the submodulesmake uninstall
: removes the files installed by make install
Outotune has 3 ports: mono input, MIDI input and mono output. The set of
currently active notes is maintained by listening to MIDI Note On
and Note Off
events. In each period, Outotune analyses the the audio in the input
buffer, synthesizes a voice for each active note, and muxes them into a single
output. In the LV2 plugin, the instantaneous estimated fundamental frequency
is available as an output parameter pitch
.
Outotune has two modes which control how the MIDI events are interpreted:
Absolute mode. For each active note n
(corresponding to the MIDI event
Note On n
), a voice is synthesized with the pitch corresponding to the note
n
. This emulates the functionality of the original harmonizer.
Relative mode. Each active note n
is interpreted relative to the
reference note r = 60
(corresponds to middle C). Let p
be the
instantaneous pitch of the input signal. Then the synthesized voice has pitch
p + (n - r)
. For example, playing n = 72
(a C one octave above middle C)
results in the input voice being shifted up by one octave.
The GUI consists of two parts: frequency graph and a top bar.
The frequency graph shows the estimated pitch as a function of time. The estimated pitch is shown in red, black areas represent the places where no pitch was detected.
The top bar contains three clickable toggles for controlling the application. Each of these toggles can also be triggered by a keyboard shortcut.
m
): switches between absolute and relative mode.i
): whether to include the original input in the output (besides
the synthesized voices) or not.g
): show or hide the graph.Currently, Outotune is computationally expensive. WORLD, the underlying speech analysis and synthesis system does not seem to be optimised for our workflow – in fact, real-time analysis is not officially supported (although real-time synthesis is). This may be improved by refactoring WORLD, mainly by reducing repetitive allocations and making better use of FFTW3 plans. Potential improvements can also be made by reducing the amount of data WORLD has to compute in each period.
Direct result of this is latency. For small enough buffer sizes, reducing the buffer size does not substantially reduce the work done in one period, which effectively means the smaller the buffer size, the more computationally expensive Outotune becomes. Latency is then directly determined by the smallest buffer size with which Outotune can still run smoothly on the machine.