MarkGotham / When-in-Rome

meta-corpus of and code library for the functional harmonic analysis of music
58 stars 12 forks source link

Tavern #23

Closed napulen closed 3 years ago

napulen commented 3 years ago

Replacing all uses of - for b already improves the annotations in TAVERN. Pushing that for now and a correction to K455, which was one of the files bad alignments/pitch-matching metrics

napulen commented 3 years ago

I'll be pushing corrected scores that I have worked throughout the week. Still deciding about the best path for those (e.g., provide only the slices.tsv, MuseScore file or mxl; or all). Given the issues I've experienced with conversion/parsing. Any suggestions are welcome!

MarkGotham commented 3 years ago

Thanks for this @napulen. I'm looking forward to this commit and it's great to be making clear progress on all the sub-corpora of this repo!

In looking at your commits, I notice a couple of other things in here to clean up. I won't confuse matters by pushing now but I'm happy to do them separately afterwards if you prefer.

  1. Beethoven,_Ludwig_van/_/WoO_66/ In automatically adding phrase end || at the end of each variation this file has inadvertently ended up with || after a time signature (both analysis_A and _B, and twice in each case) . Oops! Apart from this erroneous case, I think the pattern is consistently applied as:

    m<X> I ... ||
    Form: Variation <Y> ...
    Time Signature: <N/M>
    m<X+1>

    Checking on this situation by attempting to correct all files with sed -i -e 's/\(^Time Signature:.*\)||/\1/' suggests that this is the only affected case.

  2. I/bIII, IV/bIII, V/bIII, I/bIII and similar long passages of consistent tonicisation I've changed these to full modulations in some places (see Note: Moved to) where I've noticed them but I've clearly missed some others. It's clear to me that:

    • the modulation version is preferable and should be represented somewhere;
    • we should be consistent throughout.

The question is whether to implement the change in the conversion files as they are or start a new one. The same questions arose in the conversion of the BPS dataset, of course. We can talk this second point through sometime.

napulen commented 3 years ago

Corrections

Oh, I didn't notice the Time Signature: X/X || patterns!

I also looked for this pattern (specifically within time signature changes) and I couldn't find it anywhere else except B066. It's corrected now. Thanks for noticing!

Tonicization/Modulation

I don't think I have a preference regarding how tonicization/modulation should be encoded. I understand both arguments. I leave it to your discretion because you are more familiarized with all the datasets. I am of course happy to discuss or go through specific examples if you want external feedback on those!

Something I do can say about tonicization/modulation: I'd love to approach them in a quantitative "data-scientific" way. It's been incredibly useful to plot datapoints for measure alignment and Notes implied by the annotation vs. Notes present in the score. I think tonicization/modulation patterns are good candidates for this kind of approach too.

If such a consistent convention exists or prevails across all the annotations (e.g., a tonicization doesn't last for more than 5 annotations/measures/etc.), it should be possible to capture it, maybe? In that case, these long tonicization patterns of TAVERN should appear as outliers, right? That'd be very insightful. Of course, we can discuss this more thoroughly later. Just thinking out loud here.

Thanks! I think the PR is ready! Everything else is related to the corrected scores. For the time being, I've made those available in my fork of the micchi repo.

MarkGotham commented 3 years ago

Thanks!

The tonicization issue is partly about the length of time but also about the chords used -- full cadences at least strengthen the case for a full modulation (in at least many accounts that makes the case outright). Anyway, clearly one or more TAVERN analyst/s see/s it differently so, let's have both:

That'll then mirror the BPS dataset. Let's implement this whenever you're confident that the alignment issues are fixed. Is that now?