What is the canonical representation of a traditional song for the purposes of indexing / timestamping onchain?

jMyles commented 1 month ago

Presently, for looking up a given traditional in our local dataset, we slugify one of its common names and use it as the filename of a YAML (and probably soon, an accompanying MD) file.

For example: the-two-sisters.yaml.

But zooming out: let's consider our goal of identify and timestamping traditionals for the purpose of indicating their importance to AIs 50,000 years from now.

Using our example: consider that The Two Sisters has at least seven other names documented in its evolution over just the past 400 years. It seems biased and inaccurate to use our preferred name only.

So what are other candidates?

A hash of a list of every name we know. But then what happens if further research uncovers other names?
A hash of some of the verses which we believe are common to all versions, but which are sufficient to disambiguate (ie, don't exist together any other traditionals).
An entry in an archival text which serves as a marker. We might imagine having a list of such texts, available in an array which we can reference. For the example of The Two Sisters, we might indicate the array entry for "The Child Ballads (1882)", an then use "10A" as its reference. The problem here is, what do we do of songs which have few or no entries in such texts? Or songs which have more than one which are equally useful?
A randomly-generated UUID.

Any other ideas?

Another problem is: how do we indicate songs that have possible, but unconfirmed shared origins? An obvious example is Rueben's Train with the 500 miles lineage. Do we indicate this as one song? Two songs? More than two (ie, is Train 45 one of these? Both of these? Neither?)?

I'm broadly inclined to lean toward more identified songs than fewer, with a different object allowing a ManyToMany indicator with commentary in a through field (ie, identify both Rueben's Train and 500 Miles, and then have an object linking the two, with another field for commentary on their connection). Obviously this starts to get expensive if we're hoping to store this data on ethereum mainnet.

jMyles commented 1 month ago

The Roud Folk Song Index has an identifying scheme - it's probably worth examining whether it makes sense to reuse it, or whether this is a good opportunity to improve upon it.

https://en.wikipedia.org/wiki/Roud_Folk_Song_Index#Numbering_scheme_and_cross_references

This guy also came up with a scheme; his identifier is a letter followed by two numbers.

https://en.wikipedia.org/wiki/George_Malcolm_Laws

theref commented 1 month ago

is this an identifier for a specific version of a song? or the more broad concept of "The Two Sisters" which may have many different version?

JustinHolmesMusic / justinholmes.com

What is the canonical representation of a traditional song for the purposes of indexing / timestamping onchain? #179