ViSonic-NN / muscribe

MIT License
0 stars 0 forks source link

Paper hub: related works, what we've read, and what we are trying #1

Open Evan-Zhao opened 11 months ago

Evan-Zhao commented 11 months ago

MP3 to MIDI: now using the piano transcription inference framework from ByteDance. Related paper: High-resolution Piano Transcription with Pedals by Regressing Onsets and Offsets Times, 2020.

MIDI to MusicXML: having tried PM2S (ISMIR 2022), whose accuracy is lacking in beat tracking & note quantization, we're now looking for better frameworks to do this.

Evan-Zhao commented 11 months ago
Evan-Zhao commented 11 months ago

https://archives.ismir.net/ismir2019/paper/000058.pdf ISMIR 2019

cpyang123 commented 11 months ago

PM2S also doesn't provide any code for the transformation from its model output to MusicXML, so we'll need to develop, or find some packages to do this.

Evan-Zhao commented 11 months ago

Beat tracking: we are unsatisfied by the accuracy of PM2S beat tracking and decide to do it ourselves. The goal is to train a transformer model to predict where the beats land in the time series of the input MIDI. There are 2 input embedding schemes we proposed regarding what constitutes a "token" that makes up a sequence:

Evan-Zhao commented 11 months ago

Upon further inspection it seems that majority works on beat tracking takes audio directly as input -- such as the state of the art: BeatNet. Makes sense, right? Could we directly use such a tool (in parallel to our audio-to-midi framework), or could we use some ideas from those to make our own beat tracking better?

cpyang123 commented 11 months ago

As the embedding currently used is too sparce and does not produce adequate results, ie. training loss does not decrease in a reasonable time, we are exploring note-based embedding with positional encoding based on time stamps, where every note is a token.