DDMAL / CantusDB

A new site for Cantus Database running under Django.
https://cantusdatabase.org
MIT License
5 stars 6 forks source link

500 error after syllabifying? #977

Closed annamorphism closed 1 year ago

annamorphism commented 1 year ago

From an email today:

"I tried to transcribe hymn Quia defecerunt sicut fumus dies from FIN-Hy F.m.I.85 to Cantus Database. The automatic syllabification didn’t spell all the syllables correctly, so I bravely tried the new “Edit syllabification” option below Volpiano. I made the changes, saved the chant, and got the “Chant updated successfully” -message. However, when I tried to edit the chant again, the software opened a new window displaying just a message “Serve error 500”. I could have just deleted the chant and transcribed it again, but I started to wonder what the reason behind the error might be. Do you have any ideas what might have caused it?"

I haven't yet reproduced the error, but the chant in question does indeed throw an error: https://cantusdatabase.org/chant/715252.

jacobdgm commented 1 year ago

I don't have time to properly investigate this right now, but in case the chant is changed/deleted, here are some current values (taken from the admin area):

full text std: Quia defecerunt sicut fumus dies mei et ossa mea sicut in frixo-[rio confrix]-a sunt ms full text: Quia defecerunt sicut fumus dies mei et ossa mea sicut in fixo-{rio confrix}-a sunt syllabized full text: Qui-a de-fe-ce-runt si-cut fu-mus di-es me-i et os-sa me-a si-cut in fix-o- {ri-o con-frix} -a sunt volpiano: 1---d--da7---cde--d--dff-ghf-fe-gf-fe-ed-efef--ed---d--fdf---gfghgfe7--egfe-fed-ed---f--g---fgfdc-fgfdc-fed-gf-fed--dfd-dc7---d---ef--d---dcdfg--fe-fg-gf---c--df---f---f--ef-ghf777---6------6---d---dc-dff-ghgef-efhgf-fed---4

annamorphism commented 1 year ago

thanks, that is helpful! I suspect it has to do with how the curly brackets interact with volpiano. I created a dummy text with brackets which worked just fine as text until I tried to glue it to some notes, and then it also gave me a 500 and refused to let me edit the chant. (Though it did eventually give me an interesting triplicate success message, which is fun:

image

)

annamorphism commented 1 year ago

modifying one thing about my toy chant at a time and logging some results--here we get a succesful update, but it's not really as it should be--the gap has moved, which is a little mysterious.

image
annamorphism commented 1 year ago

I could add in or delete a number of hyphens and spaces in the above without any issue (but with the same weird gap in the chant). Going in to see if if was the syllabification page (and if I could remove the gap in there), I noted that there are extra spaces before my chant.

image

I deleted the extra spaces as well as changing the hyphenation and voila, 500. But I don't know if that's the culprit--in bracket-less chants I seem to be able to delete the spaces with no issue (though they always reappear...)

annamorphism commented 1 year ago
image image

Getting closer--it seems like the inserted hyphens on the left create a little space in front of them, as if they were their own word. I also tried removing the space by the bracket, so we could get bracket-hyphen as in the original attempt, but it kept inserting a space back into the syllabized text when I came back to it. And it only seems to be this weird when there is also a 6-----6 in the volpiano. But still no 500 error...hmmm...

annamorphism commented 1 year ago

So, the edit page is happy with this:

image

and also this (more correct)

image

But in both cases just going to edit syllabification breaks it--go in, change nothing, hit "save", get 500.

jacobdgm commented 1 year ago

So. Adding some observations here, mostly in an effort to get my own thoughts in order:

We seem to be hitting the 500 when we try to align a pre-syllabified text. When a user initially opens the Edit Syllabification page, what they see in the Syllabized Full Text field doesn't come from the database - it's generated on the fly, based on the MS or STD Full Text in the database, as a helpful feature for users. Once the user presses Save, the syllabified full text is saved into the database, and all future alignments are done with the Syllabified Full Text from the database.

(Aside: on this page, and everywhere else, we need to replace "syllabized" with "syllabified" - pretty sure the latter is a recognized English word, while the former was made up by someone developing on CantusDB at some point)

annamorphism commented 1 year ago

(Aside: on this page, and everywhere else, we need to replace "syllabized" with "syllabified" - pretty sure the latter is a recognized English word, while the former was made up by someone developing on CantusDB at some point)

Both are English words, actually, one via med.Lat "syllabizo" (using the Greek suffix), and the other a back-formation from "syllabification". For a long time "syllabize" was the preferred verb form (you also have "syllabate" and "syllabicate") but I think the -ify suffix is more common now...anyway not to turn this into a linguistics thread, but I think this could be changed or left as-is and be correct either way.

annamorphism commented 1 year ago

A summary of discoveries on this: The chant text "hello- world" is fine when manipulated in the chant-edit page. When you open it in "edit syllabification" (whether it is coming from the standardized or MS spelling is irrelevant) AND there is Volpiano present for it to align to, it causes the error after hitting save; if there is no Volpiano, things are fine until you add Volpiano on the chant-edit page and hit save. Note that "hello -world" doesn't do this--just the right-hanging hyphen. (It's interesting to me that this particular problem happens on pre-syllabized texts with this kind of hyphen, but treats identical texts coming from the ms-spelling field just fine...I'm guessing this is because the one context just ignores non-text characters, while the other is using them to work out the boundaries, or something.)

jacobdgm commented 1 year ago

I applied a hotfix to Production; it's fixed now.

jacobdgm commented 1 year ago

when we were splitting a pre-syllabized text, we were separating words by splitting on spaces ("hel-lo world-" becomes ["hel-lo", "world-"]), and then separating syllables by splitting on dashes, before adding the dashes in again on to everything but the final syllable ("hel-lo" becomes ["hel", "lo"] which becomes ["hel-", "lo"]). But when we try to do this for "world-", we get ["world", ""] and then ["world-", ""]. Later, we do a check to look at the last letter of the last syllable, and when Python tried to look for a particular character in the last letter of an empty syllable, it caused an IndexError which resulted in our 500 status code.

My fix involves filtering to get rid of empty syllables before checking for specific characters.

I'll open a PR with this fix presently. In this place and others, our process is somewhat inelegant, but it's not worth trying to create a more elegant/robust solution at this point because of https://github.com/DDMAL/volpiano-display-utilities.

jacobdgm commented 1 year ago

...but I think the -ify suffix is more common now...anyway not to turn this into a linguistics thread, but I think this could be changed or left as-is and be correct either way.

I think we should choose one or the other and not use the alternate version.

jacobdgm commented 1 year ago

(It's interesting to me that this particular problem happens on pre-syllabized texts with this kind of hyphen, but treats identical texts coming from the ms-spelling field just fine...I'm guessing this is because the one context just ignores non-text characters, while the other is using them to work out the boundaries, or something.)

This is basically accurate. The root of the problem: when we use pre-syllabized texts, we divide words into syllables based on dashes, which sometimes creates empty syllables that we weren't handling properly. When we use non-pre-syllabized texts, we divide words into syllables based on linguistic rules, and this process never created empty syllables.

jacobdgm commented 1 year ago

This is fixed by #981