Open coconutbrand opened 7 months ago
@coconutbrand Thank you for your questions :)
Monster MIDI Dataset is indeed a superset of LA MIDI Dataset. The main difference is that Monster is a raw/unfiltered dataset (noisy) while LA dataset was filtered to be suitable for training AI models.
I have compiled Monster dataset for MIR purposes mostly, while LA dataset was specifically compiled for Music AI purposes.
RE AI generated music in Monster... Monster MIDI dataset contains all sorts of MIDIs (thats why it is a raw MIDI dataset), it also contains black MIDIs, MIDI art, low quality MIDIs and melody MIDIs. There maybe other stuff too but this is what I have noticed while working with it.
To my best knowledge, only quality AI generated music was added to the Monster so it should not be a concern since it would be indistinguishable from human music.
To elaborate a bit more, Monster basically a superset of all datasets (with some exceptions) that are present or listed in Tegridy MIDI dataset repo. So you can cross-reference MIDIs that way if you want.
Also, please note that all MIDIs in Monster dataset were read-checked and rewritten into a proper MIDI format so the md5 hashes are different from the originals. This was done to normalize and standardize the dataset and also fix all errors and spam like bad MIDI sigs or erroneous information.
Hope this answers your questions but if not, feel free to ask :)
Sincerely,
Alex
Thank you for the detailed answer! One last question: did you transcribe any midi yourself or are these mostly from MIDI available on the internet? Again thanks for making this!
@coconutbrand Yes, Monster includes transcribed MIDIs (by me and others) but not a lot. So yes, its mostly publicly avaialble MIDIs off the internet.
Thanks a lot for creating this dataset! I can't find much descriptions about the dataset, but it looks really useful, so I have a few questions. 1. I wondering what is the difference between this and the LA MIDI dataset? Is this a superset of LA MIDI dataset or are they for different purposes? 2. Another question is does this or LA MIDI contain any AI generated music (actually I had a similar question for the tegridy midi dataset you created too)?