Schema validation of input MEI files

napulen commented 3 years ago

All XML inputs are usually validated against a schema file (e.g., XSD).

MEI is no exception: https://music-encoding.org/resources/schemas.html

If possible, given the current Python library used to parse the XML, a first step to assess an MEI file could be to validate it against the corresponding schema (e.g., Neume or CWMN).

Not sure if this functionality is provided in the xml library.

If not, what are the options?

kemalkongar commented 3 years ago

For Neume MEI Files:

These will be outputs of the OMR process. As far as I know, our assumption of syllable tags enclosing the syl and note values is valid for all outputs. Due to the algorithm in dictionary creation (mapping syllables to their neueme components and neuemes) that single condition is all that's needed to create a volpiano with at least 1 note.

For CWMN Files:

The Andrew Hughes repository is set in a certain way where notes preceed syllables. I'm not sure if this is the generally accepted way of writing MEI files but if it is, the algorithm should continue to work.

Idiosyncracies such as unknown pnames and syllable-independent notes are handles below the volpiano output.

In general:

I think it's best to have a lenient volpiano that'll be mostly correct even in the face of unknown inputs given that it's the user's responsiblity to input valid MEI files. In both automated cases, that being the conversion of Andrew Hughes repository and outputs from Rodan -- known MEI formats that are being tested in pytest.

kemalkongar commented 3 years ago

Most recent version of dev has several methods regarding the "standardization" of volpiano strings. My aim was to create a stable way of doing database comparison -- even in the case of human error or lack of MEI information for hyphens. Basically, except the initial "---" after the clef, all multiple hyphens are reduced to singles. This allows for a valid volpiano to be printed without the need to word information. Furthermore, it partially negates the difficulty of CWMN not having neueme components for single hyphens.

Basically, explicitly paired neueme components, i.e. "gfeh" are kept together while any sort of separation, whether it be a neueme component or syllable, inserts a single hyphen. I've added a method that compares a volpiano (say, one from Cantus) to an MEI file (CWMN or Neueme) and return a "standardized volpiano".

I know we want to get everything to "work" as a first priority so I thought this would be a flawed but efficient way of streamlining the Cantus volpiano conversion. It also addresses the point in this issue, which is why I wrote it here.

By standardizing volpiano around notes, we ensure close neueme components are printed next to each other and that spaces will exist for whatever reasons the components weren't together. This allows for even invalid XML files to be converted into valid volpiano with the only loss of information during conversion being the number of hyphens. It's a Band-Aid style solution but I think it will help us immensly with testing.

DDMAL / MEI2Volpiano

Schema validation of input MEI files #34