'When in Rome' brings together the world's functional harmonic analyses in encoded formats into a single, consistent repository. This enables musicians and developers to interact with that great body of work at scale, with minimal overheads.
In total, there are now approximately 2,000 analyses of 1,500 distinct works.
Additionally, 'When in Rome' provides code for working with these corpora, building on the music21 library for music analysis.
This is best thought of as primarily a corpus of analyses which secondarily provide code for working with them and include the score where possible. I.e., the focus is on the analyses. There is a very great deal we can do with those analyses alone. Clearly there are also certain questions to require analysis-source alignment. We do our best to cater for that by including the score wherever possible, and as reliably aligned as possible (as anyone in the field knows, this is a significant challenge).
'When in Rome' data is also used in external research projects and apps including the:
Are you using 'When in Rome' in a public-facing project? Let us know!
We're proud of how useful this is. All the same, it might not serve your needs. Might we suggest that if you're looking for:
<genre>/<composer>/<set>/<movement>/<files>
<genre>
: A top level classification of the works by approximate genre or repertoire. As most corpora are prepared in relation to this categorisation, this top level division also reflects something of the corpora's origins. (For the avoidance of doubt, every analysis includes an attribution.)
<composer>
: composer's name in the form Last,_First
.
<set>
: extended work (e.g. a song cycle or piano sonata) where applicable. Stand-alone scores are placed in a set called _
(i.e. a single underscore) for the sake of consistency.
<movement>
: name and/or number of the movement. In the case of a piano sonata, folder names are generally number-only: e.g. 1
. Most songs include both the name of the song and its position in the set (e.g. 1_Nach_Süden
)
<files>
: See the following sub-sections.
The Key Modulations and Tonicizations corpus is a slight exception: we preserve the organisation of that corpus by author, title, example number, e.g., Corpus/Textbooks/Aldwell,_Edward/Harmony_and_Voice_Leading/2a/
. So the <genre>
is Textbooks
, the <composer>
is the author, the <set>
is the title, and the <movement>
is the example number.
We find this more logical that re-organisation by composer.
score.mxl
or a remote.json
file including links to external score files
score.mxl
is a copy of the score in the compressed musicXML format.
This is provided for all new scores, as well as all originating elsewherescore.mxl
, there is a remote.json
instead. Please note:.mxl
) rather than remote.
mscore
package
(see Code.updates_and_checks.convert_musescore_score_corpus
).Code.updates_and_checks.remote_scores
and the argument convert_and_write_local
. Read those docs for details and warnings.analysis.txt
analysis_automatic.rntxt
.
remote.json
files
Opus
and/or equivalent).analysis_<analyst>.txt
analysis.txt
throughout, we name the pair analysis.txt
(note not
analysis_A.txt
) and analysis_B.txt
.analysis.txt
name.This repo. includes code and clear instructions for creating any or all of the following additional files for the whole meta-corpus, or for a specific sub-corpus.
The example folder contains all of these files for one example score:
Clara Schumann's Lieder, Op.12, No.4, 'Liebst du um Schönheit'.
Most of the variants derive from the options for pitch class profile generations, creating files in the form: profiles_<and_features_>by_<segmentation_type>.<format>
<and_features_>
(optional) includes harmonic feature information. See notes at Code/Pitch_profiles/chord_features.py<segmentation_type>
options group by moments of change to the chord
, key
, or measure
.<format>
options are .arff
, .csv
, .json
, and .tsv
.Apart from these, the example folder also contains the files which are included in all folders by default (see above) as well as others that can likewise be generated across the meta-corpus:
analysis_on_score.mxl
: the analysis rendered in musical notation alongside the score (as an additional 'part').feedback_on_analysis.txt
: automatically generated feedback on any analysis complete with an overall rating. Useful for proofreading. See Code/romanUmpire.py for more details on what it can and can't do.<Keys_or_chords>_and_distributions.tsv
: pitch class distributions for each range delimited by a single key or chord. See notes at Code/Pitch_profiles/get_distributions.pyslices.tsv
and/or slices_with_analysis.tsv
: a tabular representation of the score in 'slices' - vertical cross-sections of the score, with one entry for each change of pitch.
This is useful for various tasks, both human (at-a-glance checks) and automatic (much quicker to load and process than parsing musicXML).
The columns from left to right set out the:
Offset
from the start (a time stamp measured in terms of quarter notes),Measure
number,Beat
,Beat 'Strength'
(from relative metrical position),Length
(also measured in quarter notes),Pitches
,Key
, Chord
template.txt
: a proto-analysis text file with only the metadata, time signatures, measures, and measure equality ranges as a template - i.e. all the information you need from the score with space to enter your own analysis from scratch.This is clearly too much to include for every entry. Use the example folder to see the options and 'try before you' commit to a corpus-wide generation.
This corpus involves the combination of new analyses with conversions of those originating elsewhere.
Converted from other formats:
Analyses originally in the 'RomanText' format (no conversion needed), analysed by Dmitri Tymoczko and colleagues, and forming part of the supplementary to Tymoczko's forthcoming "TAOM", include:
Several corpora have full or partial coverage from more than one source. The most complex case is the the Beethoven Piano Sonata collection for which there are 3 external corpora, all of them incomplete:
There is not yet a single source for this collection. Are you tempted to attempt that? Do get in touch?
For developers, please see the individual code files for details of what they do and how.
Run code scripts from the repo's base directory (When-in-Rome
) using the format:
>>> python3 -m Code.<name_of_file>
For example, this is the syntax for processing one score (feedback, slices, etc.):
>>> python3 -m Code.updates_and_checks --process_one_score OpenScore-LiederCorpus/Bonis, _Mel/_/Allons_prier!
Briefly, this repo. includes:
Here are a couple of example of what all that can lead to:
A histogram of augmented chord usage in the lieder corpus ...
... and a histogram of fifth progression types across corpora:
New content in this repository, including the new analyses, code, and the conversion (specifically) of existing analyses is available under the CC BY-SA licence (a free culture licence) except by arrangement. Please get in touch with requests for special permission.
For analyses that originated elsewhere and have been converted into the format used here,
please refer to the original source for licence.
Links are provided to those original sources throughout the repository including the
itemised list above and within every analysis.txt
file.
These external licences vary.
As far as we can tell, all the content here is either original to this repo,
or properly credited and fair to use in this way.
If you think you see an issue please let us know.
Again, if you are simply looking for a scores in a maximally permissive licence, then head to the
OpenScore collections which are notable for using CC0.
For research and other public-facing projects making use of this work, please cite or otherwise acknowledge one or more of the papers listed below as appropriate to your project.
Here's the best way to cite the code and/or corpus:
@article{gotham_when_2023,
title = {When in {Rome}: a meta-corpus of functional harmony},
shorttitle = {When in {Rome}},
journal = {Transactions of the International Society for Music Information Retrieval},
author = {Gotham, Mark and Micchi, Gianluca and Nápoles-López, Néstor and Sailor, Malcolm},
year = {2023},
}
Alternatively, depending on the specific context, it may be appropriate to cite one of the papers using this data and functionality:
As the papers attest, harmonic analysis is fundamentally, necessarily, and intentionally a reductive act that includes a good degree of subjective reading. As such, these analyses are not in any sense 'definitive', to the exclusion of other possibilities. Quite the opposite: part of the point of having a representation format like this is to enable the recording of variant readings. Please feel free to re-analyse these works by using the existing analysis as a template and changing the parts you disagree with.
var
) option that rntxt provides. E.g. m1 I b2 IV
followed by a new line with m1var1 I b2 ii6
Analyst: [Your name] after [their name]
For more details of the RomanText format used to encode analyses here, see: