DDMAL / text_alignment

Aligns correct transcripts to text images using a "messy" OCR and Needleman-Wunsch sequence alignment
MIT License
6 stars 1 forks source link

Syllabification doesn't match Cantus DB #15

Open JoyfulGen opened 3 years ago

JoyfulGen commented 3 years ago

Words containing two consonants in a row ("laCTabat," "feSTinantes," "paSTores") are often separated into syllables differently by the Rodan job than what's on the Cantus DB. Specifically, the two consonants are put into different syllables on Cantus (which I think makes more sense), but on Neon the two consonants are grouped in the second syllable. So for example:

lactabat

Cantus: lac-ta-bat / Neon: la-cta-bat

festinantes

Cantus: fes-ti-nan-tes / Neon: fe-sti-na-ntes

pastores

Cantus: pas-to-res / Neon: pa-sto-res

If the two matched, it would save a fair amount of time.

JoyfulGen commented 3 years ago

The plot thickens... There are places in the manuscript where the syllables are clearly separated in the "Neon way," so with the two consonants grouped in the second syllable. Consider the following examples, from folios 046r and 038r respectively:

Syllabification example 1 046r Syllabification example 2 038r

This syllabification does make more sense from a singer's point of view, but I don't think it corresponds to the modern rules of syllable separation that Cantus DB uses.