DamRsn / NeuralNote

Audio Plugin for Audio to MIDI transcription using deep learning.
Apache License 2.0
1.17k stars 61 forks source link

CQT question #27

Closed creaktive closed 1 year ago

creaktive commented 1 year ago

First of all, thanks for this cool project! In Could NeuralNote transcribe audio in real-time?, you mention that:

The CQT requires really long audio chunks (> 1s) to get amplitudes for the lowest frequency bins.

In my experience, the lowest piano key (27.5Hz) requires about 0.62s of samples to be detected. Still not quite real-time, but technically it is possible to detect correlation even earlier, when using a streaming variant of the transform.

Out of curiosity, how difficult do you think it would be to use such streaming transform with your neural network?

DamRsn commented 1 year ago

Hi, thanks for the question!

Changing the input features requires to retrain the neural network from scratch. In NeuralNote, the trained model from basic-pitch (model here) is used, and they did not open-sourced the training code AFAIK. But with some work it should be possible to retrain everything following the paper.

But I'm not sure a DFT would work well as input feature here, because convolutions work well on evenly spaced data, and here that means in the note space (each bin separated by a fraction x of a semitone). So the DFT, with linearly spaced bins in frequency, wouldn't fit here I think.

One point I didn't mention in the readme that also prevents NeuralNote from being real-time is the note creation process from the outputs of the CNN. The signal is processed backward at some points, so all of this is far from being causal. See Lib/Model/Notes.cpp for more details. But it might be possible to create a causal algorithm that does this.

A latency of ~0.5 seconds is still way too high for real-time applications, to be able to dub an instrument with a midi synth for example.

To conclude I'd say that making basic-pitch real-time would require a lot of work and research!