MarkGotham / When-in-Rome

meta-corpus of and code library for the functional harmonic analysis of music
58 stars 12 forks source link

Adding automatic analyses from AugmentedNet v1.5.1 #41

Closed napulen closed 2 years ago

napulen commented 2 years ago

This PR adds 1,296 automatic Roman numeral analyses for the score.mxl files located across the Corpus tree of the repository.

The analyses were generated by AugmentedNet v1.5.1, using the following script:

import os
from pprint import pprint
import shutil

from AugmentedNet.inference import predict
from AugmentedNet.utils import tensorflowGPUHack
from tensorflow import keras

if __name__ == "__main__":
    wirpath = "When-in-Rome/Corpus"
    tensorflowGPUHack()
    model = keras.models.load_model("AugmentedNet.hdf5")
    success = []
    fail = []
    for root, _, files in os.walk(wirpath):
        for f in files:
            if f == "score.mxl":
                filepath = os.path.join(root, f)
                print(filepath)
                rn1 = f.replace(".mxl", "_annotated.xml")
                rn1path = os.path.join(root, rn1)
                rn2 = "analysis_automatic.txt"
                rn2path = os.path.join(root, rn2)
                if os.path.exists(rn2path):
                    success.append(rn2path)
                    continue
                try:
                    predict(model, filepath)
                except:
                    print("FAILED!")
                    fail.append(rn2path)
                    continue
                shutil.move(rn1path, rn2path)
                success.append(rn2path)
    pprint(success)
    pprint(fail)

Only a few files were unable to be processed by the model for various reasons (which I haven't explored). The unprocessed files are:

MarkGotham commented 2 years ago

HI @napulen, many thanks for this!

As you know, I think it would be great to have a set of augmented net analysis in this corpus so I definitely plan to accept and merge a version of this.

One edit we definitely need is for the file extension to match the type. As it stands, this PR actually provides musicXML (.musicxml ... 12 million lines of it) but files have the extension .txt (presumably for Romantext).

Were you planning to provide Romantext, or annotated musicXML, or both?

napulen commented 2 years ago

Hi @MarkGotham,

My bad, there is an error along the way. I intended RomanText.

Given that said, I released v1.6.0 yesterday, with twice the accuracy on chord segmentation, so I'll re-compute these and share a new version of the RomanText files.

Thanks for noticing this!

MarkGotham commented 2 years ago

Sounds great. And bravo on segmentation ... I hear it's quite important (Gotham et al. 2021).

napulen commented 2 years ago

New batch coming up soon. Closing this PR.