alpheios-project / alpheios-core

Alpheios Core Javascript Packages and Libraries
15 stars 2 forks source link

treebank: option to show all inflections, highlighting the one from the tree #540

Closed balmas closed 3 years ago

balmas commented 3 years ago

What Alpheios does is disambiguate the lexemes identified by the parser with the lexeme in the treebank.

For the lexemes themselves (i.e. the distinct lemma entries, as far as we can determine them) it identifies which lexeme is the one chosen by the treebank with the little red triangle, but still shows the other lexemes.

However, for the inflections within the disambiguated lexeme, it filters the inflections to show only the one that was chosen by the treebank.

@vgorman1 requests that we have the possibility to show the various possible inflections and mark the one in the tree with a similar triangle.

balmas commented 3 years ago

this is fixed in Alpheios Components 3.3.1-qa.20210106574

For now, the default has been changed to always display all inflections, and to highlight the selected inflection with the same icons we use for the lemma:

c6f057a5-bbc9-4357-8cde-7030030eb139

It's not perfect -- when there are more than 1 possible inflection identified by morpheus, those are displayed below the selected one from the tree, including (again) the one from the tree. Ideally, I would like to dedupe the inflection from the tree out of the list of the inflections identified by morpheus, but that is too big of a change for me to make right now.

Still to be determined is whether we want this changed default behavior to apply to the treebanked texts at texts.alpheios.net -- need to verify that with @abrasax - but it's ready to test otherwise.

You can use treebanked texts at texts-test.alpheios.net to test as well as the treebank-test page at https://alpheios-misc-dev.s3.us-east-2.amazonaws.com/treebank-test-page/test.html

monzug commented 3 years ago

I really like this enhancement!!!

monzug commented 3 years ago

in the case of nullis (first sentence of https://texts-test.alpheios.net/text/urn:cts:latinLit:phi0620.phi001.alpheios-text-lat1/passage/1.1), we are adding again the inflection. see screenshot as Bridget said: It's not perfect -- when there are more than 1 possible inflection identified by morpheus, those are displayed below the selected one from the tree, including (again) the one from the tree. Ideally, I would like to dedupe the inflection from the tree out of the list of the inflections identified by morpheus, but that is too big of a change for me to make right now.

in texts-test version vs the live one nullis-cynthia

Screen Shot 2021-01-11 at 3 41 38 PM

monzug commented 3 years ago

In the case of fugiendo (line 9), we are adding too much, I believe.

fugiendo

this is the line from the treebank xml file: <word id="3" form="fugiendo" lemma="fugio1" postag="v-spgpnb-" head="10" relation="ADV"/>

so, my question is: are we ok to add the voice pres. pass to a verb that is only fut. pass.?

balmas commented 3 years ago

Well our requirements right now call us to consider the treebank data, which is manually annotated, to be considered more correct than the parser output. So the code is doing the expected thing here -- it's adding the present passive inflection provided by the treebank to the form, and saying that's the "correct" one.

monzug commented 3 years ago

absolutely! code is right, treebank wrong. However, would it be possible to add an extra layer that checks if form is in morphology? if not, drop it or add a comment. I do not know. just thinking. I do not even know how frequently we could have such a scenario.

On Mon, Jan 11, 2021 at 7:01 PM Bridget Almas notifications@github.com wrote:

Well our requirements right now call us to consider the treebank data, which is manually annotated, to be considered more correct than the parser output. So the code is doing the expected thing here -- it's adding the present passive inflection provided by the treebank to the form, and saying that's the "correct" one.

— You are receiving this because you were assigned. Reply to this email directly, view it on GitHub https://github.com/alpheios-project/alpheios-core/issues/540#issuecomment-758122452, or unsubscribe https://github.com/notifications/unsubscribe-auth/AJ32UOPTV25VKDMTAPF4XZTSZM4GRANCNFSM4SQQ3BEA .

balmas commented 3 years ago

When you say "form is in morphology" do you mean, check that the form is one that was returned by the morphology parser (in this case whitaker)?

If so, the problem is that we don't have a way to differentiate which source is right -- i.e. the parser or the treebank. Ultimately this is why we need active annotation support in Alpheios, which is the subject of our next major release. (see lengthy ongoing discussions of the design to support that at https://github.com/alpheios-project/documentation/issues/40 )

vgorman1 commented 3 years ago

And there are definitely times when the morphology yielded is not a possible annotation. My most frequent example is δεῖ, which is always analyzed as 3s imperfect ind act, when it is actually present tense and ἔδει is the imperfect [as per the LSJ]. Clearly it was just entered wrong at some point. There are also real doozies of misspellings in Morpheus now and then.

So, as we all agree, the ideal is to mark the tree annotation but include the other possibilities. You might also be able to use form and frequency data from various users somehow

Vanessa B. Gorman Professor of Ancient History Department of History 619 Oldfather Hall University of Nebraska-Lincoln Lincoln, NE 68588-0327 https://vgorman1.github.io/

From: Bridget Almas notifications@github.com Sent: Monday, January 11, 2021 12:37 PM To: alpheios-project/alpheios-core alpheios-core@noreply.github.com Cc: Vanessa Gorman vgorman1@unl.edu; Mention mention@noreply.github.com Subject: Re: [alpheios-project/alpheios-core] treebank: option to show all inflections, highlighting the one from the tree (#540)

Non-NU Email


When you say "form is in morphology" do you mean, check that the form is one that was returned by the morphology parser (in this case whitaker)?

If so, the problem is that we don't have a way to differentiate which source is right -- i.e. the parser or the treebank. Ultimately this is why we need active annotation support in Alpheios, which is the subject of our next major release. (see lengthy ongoing discussions of the design to support that at alpheios-project/documentation#40https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_alpheios-2Dproject_documentation_issues_40&d=DwMCaQ&c=Cu5g146wZdoqVuKpTNsYHeFX_rg6kWhlkLF8Eft-wwo&r=2H4UzsQahEdH_c1kLxyGAg&m=zZ_Iwqp4iZc6TTRDGbAxgphF_k5yZfnPXZAEa7sy4dU&s=DCsjCcG7Ib7oZt5nQD7P25cLWroh_CQTz2WPPgDn64Q&e= )

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHubhttps://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_alpheios-2Dproject_alpheios-2Dcore_issues_540-23issuecomment-2D758142700&d=DwMCaQ&c=Cu5g146wZdoqVuKpTNsYHeFX_rg6kWhlkLF8Eft-wwo&r=2H4UzsQahEdH_c1kLxyGAg&m=zZ_Iwqp4iZc6TTRDGbAxgphF_k5yZfnPXZAEa7sy4dU&s=ei0JiTd1uewijTcjkfk3yHXmRebXEGByHaXl56gNOBg&e=, or unsubscribehttps://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_notifications_unsubscribe-2Dauth_ACPTZUUYFKA6EZH32UBH23LSZNALTANCNFSM4SQQ3BEA&d=DwMCaQ&c=Cu5g146wZdoqVuKpTNsYHeFX_rg6kWhlkLF8Eft-wwo&r=2H4UzsQahEdH_c1kLxyGAg&m=zZ_Iwqp4iZc6TTRDGbAxgphF_k5yZfnPXZAEa7sy4dU&s=YjzaZFXv3FCxusyH-XslGNK8DeNa68OzemPZJ7GNe04&e=.

balmas commented 3 years ago

another interesting point here, which @rgorman helped clarify -- the treebank reports this as mood=gerundive. The Whitake parser used by Alpheios reports all Gerunds as verb participles (see alpheios-project/morphsvc#11).

I think perhaps we should make a change to the Alpheios treebank adapter to consider a Latin verb with mood=gerundive as being the same as a verb participle. That way we will at least be comparing apples to apples. We'd still have a disconnect here, because the tense in the treebank is present. But we would at least be matching the part of speech.

monzug commented 3 years ago

gerundive verbs have been hunting me for ever! issue #608 is very welcome.

balmas commented 3 years ago

Wrt to https://github.com/alpheios-project/alpheios-core/issues/540#issuecomment-757999170 there are actually two issues here, one which I didn't see at first

(1) Morphology Service says the lemma has inflection A and inflection B Treebank says it is inflection B We should show inflection B only once in the popup.

Originally I thought that was what was going on with nullis. That has to wait for the changes we're making to support annotations. However, looking more closely at the output, I realize that is actually a different scenario:

(2) Morphology service says inflection A m,f,n Treebank service says inflection A f We should recognize that Inflection A f is not different than Inflection A m, f n

This scenario is more similar to #608 and #609. Will enter a new issue for it.

monzug commented 3 years ago

verified. All comments have been addressed in separate issues.