To start working towards the long-term goal of making CoverHunter widely useful to support ethnomusicology and culture-specific musicology generally, we need high-quality tagged datasets (like I have and am willing to share for Irish trad dance music) that are just big enough to do research training. The long-term goal is to learn what hyperparameters or deeper structures in CoverHunter (or probably any other NN CSI implementation) are culturally specific, and how a researcher in a specific musical culture should tweak the hyperparameters or model to fit their music.
My starting hypothesis is that probably all or most of the past decades of CSI research have been focused on Western-style pop music - mainly because the big industry funders of this research have a business model that requires CSI to work on Western-style pop music - and that we will have less success with these as-is models within traditional folk musics.
By "tagged datasets" I mean collections of real-world audio data that:
is restricted to one specific musical culture.
is already cut into one musical identity per recording (in Irish trad, for example, a tune set with 3 tunes played seamlessly into each other must be cut precisely to separate each tune into separate files.) Intro/outro filler music and speech can be left in.
is accompanied with metadata about each recording containing a minimum of:
unique identifier for the "song" or "tune" or "piece", or whatever culturally appropriate concept applies
in which all of the unique identifiers were assigned by human experts in that particular musical culture who practiced a shared, very consistent method for assigning these identifiers that apply universally across that musical culture. No performances can be assigned to more than one identifier.
this unique identifier could be simply embedded in the audio filename in some consistent, machine-parseable way.
has a minimum of 16kHz sampling rate
mostly contains entire performances of the tune/piece/song (not just fragments)
generally has a minimum of 3 different performances of each piece/song/tune
To start working towards the long-term goal of making CoverHunter widely useful to support ethnomusicology and culture-specific musicology generally, we need high-quality tagged datasets (like I have and am willing to share for Irish trad dance music) that are just big enough to do research training. The long-term goal is to learn what hyperparameters or deeper structures in CoverHunter (or probably any other NN CSI implementation) are culturally specific, and how a researcher in a specific musical culture should tweak the hyperparameters or model to fit their music.
My starting hypothesis is that probably all or most of the past decades of CSI research have been focused on Western-style pop music - mainly because the big industry funders of this research have a business model that requires CSI to work on Western-style pop music - and that we will have less success with these as-is models within traditional folk musics.
By "tagged datasets" I mean collections of real-world audio data that: