full data annotations - Githubissues

The eventual goal is to get the entire dataset completely annotated, but it would be nice off the bat to have a smaller slice with complete annotations, both for model development and evaluation.

Two options come to mind:

Pull out a subset of the openmic2018 data and crowdsource full annotations. This will be costly, so the set will have to be relatively small (1000 tops, i'm guessing). We'll have to work a bit to make sure the coverage is good.
Pull an independent set of clips from the larger FMA pool that openmic2018 came from, using similar ranking and quantile sampling strategies (per instrument), then source complete annotations. This way, we avoid any potential contimation / long-term overfitting on openmic2018, but still get a representative sample of full annotations.

(2) is obviously more work, but I think it's doable, and better all around. What do others think?

cosmir / openmic-2018

full data annotations #26