ViSonic-NN / muscribe

MIT License
0 stars 0 forks source link

Paper hub: related works, what we've read, and what we are trying #1

Open Evan-Zhao opened 1 year ago

Evan-Zhao commented 1 year ago

MP3 to MIDI: now using the piano transcription inference framework from ByteDance. Related paper: High-resolution Piano Transcription with Pedals by Regressing Onsets and Offsets Times, 2020.

MIDI to MusicXML: having tried PM2S (ISMIR 2022), whose accuracy is lacking in beat tracking & note quantization, we're now looking for better frameworks to do this.

Evan-Zhao commented 1 year ago
Evan-Zhao commented 1 year ago

https://archives.ismir.net/ismir2019/paper/000058.pdf ISMIR 2019

cpyang123 commented 1 year ago

PM2S also doesn't provide any code for the transformation from its model output to MusicXML, so we'll need to develop, or find some packages to do this.

Evan-Zhao commented 1 year ago

Beat tracking: we are unsatisfied by the accuracy of PM2S beat tracking and decide to do it ourselves. The goal is to train a transformer model to predict where the beats land in the time series of the input MIDI. There are 2 input embedding schemes we proposed regarding what constitutes a "token" that makes up a sequence:

Evan-Zhao commented 1 year ago

Upon further inspection it seems that majority works on beat tracking takes audio directly as input -- such as the state of the art: BeatNet. Makes sense, right? Could we directly use such a tool (in parallel to our audio-to-midi framework), or could we use some ideas from those to make our own beat tracking better?

cpyang123 commented 1 year ago

As the embedding currently used is too sparce and does not produce adequate results, ie. training loss does not decrease in a reasonable time, we are exploring note-based embedding with positional encoding based on time stamps, where every note is a token.