DCMLab / standards

Repository containing standards developed at the DCML. https://dcmlab.github.io/standards
4 stars 0 forks source link

Downward compatibility of changes in the annotation standard / guidelines #8

Open fabianmoss opened 5 years ago

fabianmoss commented 5 years ago

This is a major issue and should be discussed thoroughly. As I understand, the annotation standard has already undergone substantial changes from the version that was published in the report. The only change that I can remember was during the review process of that report where we substituted " for [ and ] to indicate pedal notes. What we did then is to convert all symbols to the new convention.

With the changed (and changing) standard and growing set of corpora we have now the issue that the corpora have very different content and can not be parsed (i.e. used) automatically which doesn't make sense.

The ideal solution would be that each change in the annotation standard is versioned as well. Since I and Johannes are planning to submit an ISMIR paper presenting the annotation standard along with its description and advantages/disadvantages compared to others, it might make sense to have a separate repository that only contains of this standard (including all of its current and future versions).

We would need then a working pipeline to update all corpora at once, when a new version of the standard is released (which always should entail a new version of the corpora). Users can then choose which version they want to use and we would ensure that all corpora are always usable under the same standard. Maybe, if we have a working pipeline, we would not have to have a separate regex repo. In any case, what we want to have in the end, I assume, is a public repo including the regex and all published (i.e. syntactically and semantically checked) corpora so that users can easily pull it and work with it.

I assume that this is not a minor project but it is in my opinion extremely important and urgent since the amount of corpora (luckily) grows constantly.

One example concerns the discussion about the aumented sixth chords in issue DCMLab/corpora#39. I am not aware of the concrete other chanes but I know that there have been some.

MFNeuwirth commented 5 years ago

@fabianmoss: So far there is only one change (namely regarding the use of "#" depending on whether or not we are confronting a minor or major scale reference). Further, we are currently discussing how to make the use of special symbols (".Ger6" etc.) more consistent. Third, there is the open issue of how to notate a double pedal note. What we also did, is clarifying the rules for using phrase-ending notation and the interpretation thereof. To sum, we are far from substantial changes, IMHO. Nonetheless, I agree we should discuss more thoroughly the direction(s) in which the standard should be developed and potentially extended (e.g., the notation of church modes), rather than making a number of ad-hoc fixes. Whether the ISMIR deadline is suited for this, is separate issue.

fabianmoss commented 5 years ago

From a syntactic point of view there is not distinction between substantial and accidental changes. But it is good to know that not many changes have been made. My point still holds: If the rule is now "In major vii and in minor #vii" (with which I completely agree!) then we should also have checked the data to conform to this rule! Because we don't know now in which cases the rule was implemented or not.

It can be done easily along these lines:

for row in df:
    mode = row.local_key
    acc, root = re.match('([#b]*)([VIvi]+)', row.chord)
    if mode.isupper() & root in ['vii', 'VII']: # local major key
        acc = ''
    elif mode.islower() & root in ['vii', 'VII']: # local minor key
        acc = '#' # this does actually not work because of chords like `##vii`