MarkGotham / When-in-Rome

meta-corpus of and code library for the functional harmonic analysis of music
58 stars 12 forks source link
corpus dataset harmony music

GitHub top language GitHub issues GitHub last commit GitHub repo size License

When in Rome

'When in Rome' brings together the world's functional harmonic analyses in encoded formats into a single, consistent repository. This enables musicians and developers to interact with that great body of work at scale, with minimal overheads.

In total, there are now approximately 2,000 analyses of 1,500 distinct works.

Additionally, 'When in Rome' provides code for working with these corpora, building on the music21 library for music analysis.

Is it for me?

This is best thought of as primarily a corpus of analyses which secondarily provide code for working with them and include the score where possible. I.e., the focus is on the analyses. There is a very great deal we can do with those analyses alone. Clearly there are also certain questions to require analysis-source alignment. We do our best to cater for that by including the score wherever possible, and as reliably aligned as possible (as anyone in the field knows, this is a significant challenge).

Maybe yes ...

'When in Rome' data is also used in external research projects and apps including the:

Are you using 'When in Rome' in a public-facing project? Let us know!

Maybe no ...

We're proud of how useful this is. All the same, it might not serve your needs. Might we suggest that if you're looking for:

Corpus Directory Structure

Overall

<genre>/<composer>/<set>/<movement>/<files>

The Key Modulations and Tonicizations corpus is a slight exception: we preserve the organisation of that corpus by author, title, example number, e.g., Corpus/Textbooks/Aldwell,_Edward/Harmony_and_Voice_Leading/2a/. So the <genre> is Textbooks, the <composer> is the author, the <set> is the title, and the <movement> is the example number. We find this more logical that re-organisation by composer.

All folders include:

Some folders include:

Optional extra files (not included but easy to generate):

This repo. includes code and clear instructions for creating any or all of the following additional files for the whole meta-corpus, or for a specific sub-corpus.

The example folder contains all of these files for one example score: Clara Schumann's Lieder, Op.12, No.4, 'Liebst du um Schönheit'. Most of the variants derive from the options for pitch class profile generations, creating files in the form: profiles_<and_features_>by_<segmentation_type>.<format>

Apart from these, the example folder also contains the files which are included in all folders by default (see above) as well as others that can likewise be generated across the meta-corpus:

This is clearly too much to include for every entry. Use the example folder to see the options and 'try before you' commit to a corpus-wide generation.

Corpus Overview

This corpus involves the combination of new analyses with conversions of those originating elsewhere.

Corpora originating elsewhere

Converted from other formats:

Analyses originally in the 'RomanText' format (no conversion needed), analysed by Dmitri Tymoczko and colleagues, and forming part of the supplementary to Tymoczko's forthcoming "TAOM", include:

Mixed sources

Several corpora have full or partial coverage from more than one source. The most complex case is the the Beethoven Piano Sonata collection for which there are 3 external corpora, all of them incomplete:

  1. 64 movements from DCML's 'romantic_piano_corpus'.
  2. 36 movements from Dmitri Tymoczko's TAOM collection
  3. 32 movements (complete first movements) as converted from the
    'BPS-FH' dataset, ISMIR 2018.

There is not yet a single source for this collection. Are you tempted to attempt that? Do get in touch?

New corpora by MG and colleagues

Code and Lists

For developers, please see the individual code files for details of what they do and how.

Run code scripts from the repo's base directory (When-in-Rome) using the format:

>>> python3 -m Code.<name_of_file>

For example, this is the syntax for processing one score (feedback, slices, etc.):

>>> python3 -m Code.updates_and_checks --process_one_score OpenScore-LiederCorpus/Bonis, _Mel/_/Allons_prier!

Briefly, this repo. includes:

Here are a couple of example of what all that can lead to:

A histogram of augmented chord usage in the lieder corpus ... histogram of augmented chord usage in the lieder corpus

... and a histogram of fifth progression types across corpora: histogram of fifth progressions across corpora

Licence, Citation, Contribution

Licence

New content in this repository, including the new analyses, code, and the conversion (specifically) of existing analyses is available under the CC BY-SA licence (a free culture licence) except by arrangement. Please get in touch with requests for special permission.

For analyses that originated elsewhere and have been converted into the format used here, please refer to the original source for licence. Links are provided to those original sources throughout the repository including the itemised list above and within every analysis.txt file.

These external licences vary. As far as we can tell, all the content here is either original to this repo,
or properly credited and fair to use in this way. If you think you see an issue please let us know. Again, if you are simply looking for a scores in a maximally permissive licence, then head to the OpenScore collections which are notable for using CC0.

For research and other public-facing projects making use of this work, please cite or otherwise acknowledge one or more of the papers listed below as appropriate to your project.

Citation

Here's the best way to cite the code and/or corpus:

@article{gotham_when_2023,
    title = {When in {Rome}: a meta-corpus of functional harmony},
    shorttitle = {When in {Rome}},
    journal = {Transactions of the International Society for Music Information Retrieval},
    author = {Gotham, Mark and Micchi, Gianluca and Nápoles-López, Néstor and Sailor, Malcolm},
    year = {2023},
}

Alternatively, depending on the specific context, it may be appropriate to cite one of the papers using this data and functionality:

Syntax and Contributing

As the papers attest, harmonic analysis is fundamentally, necessarily, and intentionally a reductive act that includes a good degree of subjective reading. As such, these analyses are not in any sense 'definitive', to the exclusion of other possibilities. Quite the opposite: part of the point of having a representation format like this is to enable the recording of variant readings. Please feel free to re-analyse these works by using the existing analysis as a template and changing the parts you disagree with.

For more details of the RomanText format used to encode analyses here, see: