Closed napulen closed 3 years ago
I'll be pushing corrected scores that I have worked throughout the week. Still deciding about the best path for those (e.g., provide only the slices.tsv
, MuseScore file or mxl
; or all). Given the issues I've experienced with conversion/parsing. Any suggestions are welcome!
Thanks for this @napulen. I'm looking forward to this commit and it's great to be making clear progress on all the sub-corpora of this repo!
In looking at your commits, I notice a couple of other things in here to clean up. I won't confuse matters by pushing now but I'm happy to do them separately afterwards if you prefer.
Beethoven,_Ludwig_van/_/WoO_66/
In automatically adding phrase end ||
at the end of each variation this file has inadvertently ended up with ||
after a time signature (both analysis_A and _B, and twice in each case) . Oops! Apart from this erroneous case, I think the pattern is consistently applied as:
m<X> I ... ||
Form: Variation <Y> ...
Time Signature: <N/M>
m<X+1>
Checking on this situation by attempting to correct all files with sed -i -e 's/\(^Time Signature:.*\)||/\1/'
suggests that this is the only affected case.
I/bIII, IV/bIII, V/bIII, I/bIII
and similar long passages of consistent tonicisation
I've changed these to full modulations in some places (see Note: Moved to
) where I've noticed them but I've clearly missed some others. It's clear to me that:
The question is whether to implement the change in the conversion files as they are or start a new one. The same questions arose in the conversion of the BPS dataset, of course. We can talk this second point through sometime.
Oh, I didn't notice the Time Signature: X/X ||
patterns!
I also looked for this pattern (specifically within time signature changes) and I couldn't find it anywhere else except B066
. It's corrected now. Thanks for noticing!
I don't think I have a preference regarding how tonicization/modulation should be encoded. I understand both arguments. I leave it to your discretion because you are more familiarized with all the datasets. I am of course happy to discuss or go through specific examples if you want external feedback on those!
Something I do can say about tonicization/modulation: I'd love to approach them in a quantitative "data-scientific" way. It's been incredibly useful to plot datapoints for measure alignment and Notes implied by the annotation vs. Notes present in the score. I think tonicization/modulation patterns are good candidates for this kind of approach too.
If such a consistent convention exists or prevails across all the annotations (e.g., a tonicization doesn't last for more than 5 annotations/measures/etc.), it should be possible to capture it, maybe? In that case, these long tonicization patterns of TAVERN should appear as outliers, right? That'd be very insightful. Of course, we can discuss this more thoroughly later. Just thinking out loud here.
Thanks! I think the PR is ready! Everything else is related to the corrected scores. For the time being, I've made those available in my fork of the micchi repo.
Thanks!
The tonicization issue is partly about the length of time but also about the chords used -- full cadences at least strengthen the case for a full modulation (in at least many accounts that makes the case outright). Anyway, clearly one or more TAVERN analyst/s see/s it differently so, let's have both:
analysis.txt
) for the modified version with these long passages set in a new key.That'll then mirror the BPS dataset. Let's implement this whenever you're confident that the alignment issues are fixed. Is that now?
Replacing all uses of
-
forb
already improves the annotations in TAVERN. Pushing that for now and a correction toK455
, which was one of the files bad alignments/pitch-matching metrics