ethman / slakh-generation

A project to synthesize massive amounts of multitrack audio data from MIDI.
http://www.slakh.com/
MIT License
56 stars 4 forks source link

TODO: Document midi_rules #2

Open ethman opened 4 years ago

ethman commented 4 years ago

MIDI rules are important for making some patches work, and require some explaining.

greenbech commented 3 years ago

I see that the minimum bass frequency is 35 and maximum is 79 according to midi_rules/pitch.json. After hearing to many of the bass stems, I think this is one octave too high. Midi note 35 is B1 (fundamental frequency 61.74hz, one octave above the deepest note on a 5-string electric bass) and midi note 79 is G5 (fundamental frequency 783.99hz, one octave above the 24th fret on the highest string on a 4/5-string bass). I used this page as frequency reference.

I also know that the SXX.mid bass stems are shifted one octave above the numbers written above, but I guess that is because the kontakt MIDI engine renders bass track one octave below the general MIDI specification?

ethman commented 3 years ago

I also know that the SXX.mid bass stems are shifted one octave above the numbers written above, but I guess that is because the kontakt MIDI engine renders bass track one octave below the general MIDI specification?

Yeah, this is exactly why the bass notes are shifted by one octave. Sorry for the confusion!

greenbech commented 3 years ago

I also know that the SXX.mid bass stems are shifted one octave above the numbers written above, but I guess that is because the kontakt MIDI engine renders bass track one octave below the general MIDI specification?

Yeah, this is exactly why the bass notes are shifted by one octave. Sorry for the confusion!

It would be great if this were more clearly documented! I've been working on an unofficial PyTorch dataset, https://github.com/greenbech/slakh-pytorch-dataset, and I should take that into account by moving all the bass notes one octave down for automatic music transcription.

And to be clear, now it is sort of two octaves too high. If you listen to the bass stems, the lowest note you will find (B1) is one octave above the deepest note on a 5-string bass. This is not really a big problem per se, but if a new version of the dataset should be rendered it would be great to change this.

ethman commented 3 years ago

Hey, sorry for the delay. Your data loader project project seems really cool! We're also working on a slakh pytorch data loader in nussl, though work has stagnated. I'm hoping to pick it back up and polish it up for a release this summer. Feel free to either contribute to that or steal ideas from that for your project. Maybe we can merge them as well; but that's a discussion for the future.

So back to the octave issue: it stems from the fact that synth manufacturers are allowed to deviate from the official MIDI spec in many ways, including the fundamental frequencies of the notes. This ended up being quite the headache when trying to wrangle 100+ patches, which led me to making these heuristic rules. As you figured out, one of these headaches is that Kontakt, the source of all of the bass patches, interprets all bass notes one octave lower than the MIDI spec. Additionally, the range of the bass synths is limited, so if you play the MIDI as is into Kontakt, it will output silence! Thus, the solution was to shift the MIDI notes up an octave, so that they could be rendered in the octave we'd expect. But now there's a mismatch between the octave that we hear and the octave that's annotated in SXX.mid.

So then the question becomes, what should SXX.mid be for a bass track? Should it be the exact MIDI file that was used to render the audio, which would include the octave shifts? This would foster better reproducibility if someone wanted to modify existing files or make their own. On the other hand, should the bass MIDI tracks be what you actually hear? This makes a lot of sense too, especially for the use case of automatic transcription. For some reason, when I made Slakh I chose the former (keep the MIDI "as is"), but in retrospect I think this was the wrong choice. In the next few weeks, I'll make an update to the Zenodo data repo with the octave offset corrected (and a fix for the other errors you found, mentioned here: https://github.com/ethman/slakh-utils/issues/17). Since I can't rerender the data (because it will be silent), I will update the MIDI files to reflect what note we're actually hearing. And I'll add a little note to the documentation as well.

Thanks for catching this and the other errors. In all honesty, I completely forgot about the octave issue!

greenbech commented 3 years ago

Hey, sorry for the delay. Your data loader project project seems really cool! We're also working on a slakh pytorch data loader in nussl, though work has stagnated. I'm hoping to pick it back up and polish it up for a release this summer. Feel free to either contribute to that or steal ideas from that for your project. Maybe we can merge them as well; but that's a discussion for the future.

Glad to hear that you like my data loader the project! Didn't know about the that work in nussl. I'd love to contribute their or further work on my project—whatever we think is most convinient for the end-users of Slakh!

As for the bass octave issues, there are actually two indepentent things to look into and I think I've only made one of them clear. As you tell, I realize that he Kontakt synth interperects the octave differently than the official MIDI spec and had to be shiftet one octave up. As for how the MIDI files should be stored, I agree that it would be easier for automatic music transcription if the note were kept at the original MIDI spec! In general, I think it should be easier to use the dataset than to reproduce the dataset since that is the more common use-case.

The other problem is that the chosen threshold in midi_rules/pitch.json is one octave too high, both for the min and max value. I find it hard to belive that the deepest note the Kontakt synth can render is the B at the A-string on an electric bass (see picture). Right now the highest octaves in the bass MIDI files are silent in the audios! I think that if the values where changed down an octave, or that the order of the octave shift and the truncation were flipped, this last error would be removed if the bass stems were re-rendered!

deepest-bass-note

Thanks for catching this and the other errors. In all honesty, I completely forgot about the octave issue!

No problemo, just glad to help the dataset improve 👌