Open Evan-Zhao opened 1 year ago
PM2S also doesn't provide any code for the transformation from its model output to MusicXML, so we'll need to develop, or find some packages to do this.
Beat tracking: we are unsatisfied by the accuracy of PM2S beat tracking and decide to do it ourselves. The goal is to train a transformer model to predict where the beats land in the time series of the input MIDI. There are 2 input embedding schemes we proposed regarding what constitutes a "token" that makes up a sequence:
T (=0.05)
seconds; a token is a 128-dim vector denoting what notes are "on" at the moment.
torch.diff()
away). Reasons for this are not yet clear.Upon further inspection it seems that majority works on beat tracking takes audio directly as input -- such as the state of the art: BeatNet. Makes sense, right? Could we directly use such a tool (in parallel to our audio-to-midi framework), or could we use some ideas from those to make our own beat tracking better?
As the embedding currently used is too sparce and does not produce adequate results, ie. training loss does not decrease in a reasonable time, we are exploring note-based embedding with positional encoding based on time stamps, where every note is a token.
MP3 to MIDI: now using the piano transcription inference framework from ByteDance. Related paper: High-resolution Piano Transcription with Pedals by Regressing Onsets and Offsets Times, 2020.
MIDI to MusicXML: having tried PM2S (ISMIR 2022), whose accuracy is lacking in beat tracking & note quantization, we're now looking for better frameworks to do this.