alpheios-project / inflection-tables

Inflection Table Library
ISC License
3 stars 1 forks source link

greek noun, adjective, pronoun paradigms #289

Closed balmas closed 4 years ago

balmas commented 5 years ago

We would like to add the remainder of the paradigm tables (nouns, adjectives, pronouns) from http://ucbclassics.dreamhosters.com/ancgreek/paradigmsU/paradigms_U.html to Alpheios.

@vgorman1 has offered to help with the data entry/data conversion process.

this will require some changes to the inflection table code and we need to think about the best way to get the data.

irina060981 commented 4 years ago

@balmas, I thought about this task and I came to the following:

We need to create the input tool that what be the most the same as in real life as it is possible (because it should be easy checked for correct data) According to paradigms (https://ucbclassics.dreamhosters.com/ancgreek/paradigmsU/paradigms_U.html) they are tables that could have rows/columns with labels and rows/columns with data.

So it would be useful

@balmas , what do you think?

irina060981 commented 4 years ago

I think I need more time to look through our code and paradigm tables variants. because for example

image

we have 3 values inside the dual (I believe) 'n.a.v` = nominative + accusative + vocative

I need to remember/understand how it should be placed to paradigm and render it after.

irina060981 commented 4 years ago

oh, it seems to me that we don't have similiar examples in verb paradigm

balmas commented 4 years ago

I think I need more time to look through our code and paradigm tables variants. because for example

image

we have 3 values inside the dual (I believe) 'n.a.v` = nominative + accusative + vocative

I need to remember/understand how it should be placed to paradigm and render it after.

we don't necessarily need to render it exactly as the original table. I.e. we could expand this out to separate rows for each case.

balmas commented 4 years ago

It would be very interesting if we could add to this design an integration with the lexical query and inflection tables library that allows us to test the matching rules as we are creating or editing a table.

Right now it is very very difficult to figure out why a particular form does or doesn't produce a table and a match in that table. If we could expose the decision points at each step in a kind of debugging interface that would help us to see where we need to improve or change the logic.

In some ways this is a variation on the inflection games that we experimented with last year.

I don't have a complete idea in my head of all the requirements for this but maybe @monzug has some thoughts?

monzug commented 4 years ago

uhm, I remembered working on matching the verbs paradigms. will be happy to take a look. not sure what I need behind the scene yet.

balmas commented 4 years ago

ideally, I'd like to design a general purpose inflection table entry/test tool that worked with all of the types of tables we support (endings, full forms, paradigms)

irina060981 commented 4 years ago

@balmas, I started to work on the prototype and created a simple parsing of existed paradigms tables (based on the vue-cli-4) - I published on my server for now

http://alpheios-infl-check.irina-sklyarova.ru/

And I tried to check it easily - this way for example Future System (without contraction) (verb) image

I took a word from the table

βουλεύσοιτο

and searched for it using texts.alpheios.net interface and I have got another table image

I will need more time to remember, how it is chosen for the targetWord by the code. And I have a question - do we need such a check on the tool?

balmas commented 4 years ago

If I understand the question correctly, yes I would like to be able to test the table and cell matching logic.

It is a complex process that needs to be made clearer and easier to debug.

irina060981 commented 4 years ago

ok - I will remember (examine) how it matches and will add a simple ui - for checking match

monzug commented 4 years ago

Looking at the Future System (without contraction) table, Irina chose a good example of wrong matching. in morpheus, some of the verbs from the table have stem type reg_fut but matching is done on stem type ew_fut. is there a way to see the rules applied for matches into the greek paradigm tables?

irina060981 commented 4 years ago

ok, finally I have found the the way it chooses the paradigm table for βουλεύσοιτο

at the first checking step it found that there are two paradigms fit to inflections

by part of speech - verb and stemtype - reg_fut verbpgdm3 Future System (without contraction) (verb) verbpgdm4 Future System (Active and Middle) with contraction in -έω (verb)

at the second checking step it chooses only one paradigm with the highest matchOrder

verbpgdm3 has matchOrder = 1 verbpgdm4 has matchOrder = 3

that's why finally we have verbpgdm4 Future System (Active and Middle) with contraction in -έω (verb)

@monzug , what step could be wrong here then? @balmas, what do you think?

monzug commented 4 years ago

so, the second checking is overriding the first one based on match order. where is the matchOrder coming from? I mean what's the logic of matchOrder

On Mon, Nov 25, 2019 at 1:38 PM Sklyarova Irina notifications@github.com wrote:

ok, finally I have found the the way it chooses the paradigm table for βουλεύσοιτο

at the first checking step it found that there are two paradigms fit to inflections

by part of speech - verb and stemtype - reg_fut verbpgdm3 Future System (without contraction) (verb) verbpgdm4 Future System (Active and Middle) with contraction in -έω (verb)

at the second checking step it chooses only one paradigm with the highest matchOrder

verbpgdm3 has matchOrder = 1 verbpgdm4 has matchOrder = 3

that's why finally we have verbpgdm4 Future System (Active and Middle) with contraction in -έω (verb)

@monzug https://github.com/monzug , what step could be wrong here then? @balmas https://github.com/balmas, what do you think?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/alpheios-project/inflection-tables/issues/289?email_source=notifications&email_token=AJ32UOMXV6C4HCYSV43XIE3QVPBMRA5CNFSM4JKOEXYKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEFCH7PY#issuecomment-558137279, or unsubscribe https://github.com/notifications/unsubscribe-auth/AJ32UONO5W5WCVGKEFJTFKLQVPBMRANCNFSM4JKOEXYA .

irina060981 commented 4 years ago

MatchOrder is the property for each rule for paradigm tables we have the following rules.csv file (the part for the following paradigms) the second value is MatchOrder

ID ref,Match order,Part of speech,Stem type,Voice,Mood,Tense,Lemma,Morph flags,Dialect

verbpdgm3,1,verb,reg_fut,,,,,,
verbpdgm4,3,verb,reg_fut,,,,,contr,
verbpdgm3,1,verb,aw_fut,,,,,,
verbpdgm4,2,verb,aw_fut,,,,,contr,
verbpdgm3,1,verb,ew_fut,,,,,,
verbpdgm4,2,verb,ew_fut,,,,,contr,
balmas commented 4 years ago

I'm not sure if this has anything to do with why we are getting the wrong table here, but in looking at this this morning, I realized that although we have the morphflags rule (the contr in the above example) I'm not sure if we ever actually use it or not. I think we might need to work backwards from the rules and the examples in the paradigm tables to see if we get the right results. I suspect we are not using it properly right now because I think these would be in the 'morph' feature of the inflection but the rule has it not as a feature but as a separate field (that is, when we parse the rules data set voice, mood, etc. get put into the features array for the rule but the morphflags get put into a separate field, and when we compare the features of the inflection we never compare the 'morph' feature that is in the inflection against the morphFlags property of the rule). Unfortunately I don't know right now

'ἀποθανοίμην' is an example of a greek word that has the value 'contr' in its morph field in the morphology service output that could be used to test that.

In the example above, though, βουλεύσοιτο, the morphology service does not tell us anything in the morph field but it does have euw in the derivtype field. We don't currently use the derivtype feature in the paradigm matching rules but perhaps we should be.

( Now if all that doesn't give you a headache on a monday I don't know what will :-) )

monzug commented 4 years ago

I don't think we are using the contr either.

monzug commented 4 years ago

I think stem type is the element we should look at mainly. see βουλεύσοιτο vs ἀποθανοῖτο 1) stem type reg_fut derivtype euw does not have morph field

2) stem type ew_fut morph contr does not have derivtype field

monzug commented 4 years ago

look at one match for ἐλῴη
we show the Future System (Active and Middle) with contraction in -έω stemtype ew_fut derivtype allw

instead of the Future System (Active) with contraction in -άω stem type aw_fut derivtype a_stem morph contr

irina060981 commented 4 years ago

There are 3 matches for the following word: ἐλῴη

  1. First inflection

Future System (without contraction) (verbpdgm3) part of speech - verb stemtype - aw_fut matchOrder - 1

Future System (Active and Middle) with contraction in -έω (verbpdgm4) part of speech - verb stemtype - aw_fut matchOrder - 2

  1. Second inflection

Present System Active of Contract Verbs in -άω (verbpdgm22) part of speech - verb stemtype - aw_pr matchOrder - 1

The biggest matchOrder is for the Future System (Active and Middle) with contraction in -έω (verbpdgm4)

monzug commented 4 years ago

uhm, so it's the matchOrder that may be the culprit!

balmas commented 4 years ago

@monzug just want to be sure I understand. You are saying we should show verbpdgm3 and verbpdgm4 for ἐλῴη and not verbpdgm4 ?

If so, I think if we added use of the morph field we would get the right result. I think that might be the better solution than changing the matchOrder, because I think we may need that as a higher order for other verbs.

I think generally, the use of the matchOrder is flawed.

monzug commented 4 years ago

I am looking at it right now. βουλεύσω is stem type reg_fut (does not have the morph field) but we show the stem type ew_fut with contr which has matchOrder 2 (verbpdgm4 and not verbpdgm3 - I don't understand the verbpdgm numbers, though)

I did only look at one match for ἐλῴη (the future) and we show the wrong table. we need to show the Future System (Active) with contraction in -άω (verbpdgm5 and not the verbpdgm4 ) stem type aw_fut derivtype a_stem morph contr

maybe the wrong table is pulled out because of the dialet?????? in verbpdgm5, we have 2 dialets: doric or aeolic but this verb ἐλῴη has dial Attic epic Ionic

monzug commented 4 years ago

I am trying to understand why we have all these rules and then matches don't work verbpdgm3 1 verb reg_fut
verbpdgm4 3 verb reg_fut contr
verbpdgm3 1 verb aw_fut
verbpdgm4 2 verb aw_fut contr
verbpdgm3 1 verb ew_fut
verbpdgm4 2 verb ew_fut contr

I would simplify by deleting some entries such as verbpdgm4 3 verb reg_fut contr verbpdgm3 1 verb aw_fut
verbpdgm4 2 verb aw_fut contr
verbpdgm3 1 verb ew_fut

unless I got all wrong...

I also would delete the dialet from rule.csv - we have only in verbpdgm5

balmas commented 4 years ago

We aren't taking into account the "contr" morphflag , which might be part of the problem. It's also possible things were working better at one point and we broke them.

@monzug maybe we should step back and come up with a list of verbs and the tables they should (and shouldn't) match?

monzug commented 4 years ago

agree.

monzug commented 4 years ago

Irina, could you please help me to understand the logic of matchOrder? I did try few matches that I thought they would have failed and they worked like a charm instead. need to understand how the number of matchOrder have been assigned and when is it applied in the code? Thanks

On Tue, Nov 26, 2019 at 3:04 PM Bridget Almas notifications@github.com wrote:

We aren't taking into account the "contr" morphflag , which might be part of the problem. It's also possible things were working better at one point and we broke them.

@monzug https://github.com/monzug maybe we should step back and come up with a list of verbs and the tables they should (and shouldn't) match?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/alpheios-project/inflection-tables/issues/289?email_source=notifications&email_token=AJ32UONWYZRJU4336DBGSQ3QVUUHZA5CNFSM4JKOEXYKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEFGDRAA#issuecomment-558643328, or unsubscribe https://github.com/notifications/unsubscribe-auth/AJ32UOPOOV7Y2B5EZ5M54CTQVUUHZANCNFSM4JKOEXYA .

irina060981 commented 4 years ago

Monica, it has a quite simple logic. It asigned manually to each table. It is a part of the rules.csv And the second value (number) after paradigm's identificator is the matchOrder

ID ref,Match order,Part of speech,Stem type,Voice,Mood,Tense,Lemma,Morph flags,Dialect

verbpdgm3,1,verb,reg_fut,,,,,,
verbpdgm4,3,verb,reg_fut,,,,,contr,
verbpdgm3,1,verb,aw_fut,,,,,,
verbpdgm4,2,verb,aw_fut,,,,,contr,
verbpdgm3,1,verb,ew_fut,,,,,,
verbpdgm4,2,verb,ew_fut,,,,,contr,

So it parses the following way (first line)

paradigmID = verbpdgm3 matchOrder = 1 part of speech = verb stem type = reg_fut

other parameters (voice, mood, tense, lemma, morph flags, dialect) are not filled

monzug commented 4 years ago

yes, I got that. but, I didn't get how the numbers 1,2,3,4,5 are assigned to each stem type/table and how the matchOrder is then applied.

On Wed, Nov 27, 2019 at 9:27 AM Sklyarova Irina notifications@github.com wrote:

Monica, it has a quite simple logic. It asigned manually to each table. It is a part of the rules.csv And the second value (number) after paradigm's identificator is the matchOrder

ID ref,Match order,Part of speech,Stem type,Voice,Mood,Tense,Lemma,Morph flags,Dialect

verbpdgm3,1,verb,reg_fut,,,,,, verbpdgm4,3,verb,reg_fut,,,,,contr, verbpdgm3,1,verb,aw_fut,,,,,, verbpdgm4,2,verb,aw_fut,,,,,contr, verbpdgm3,1,verb,ew_fut,,,,,, verbpdgm4,2,verb,ew_fut,,,,,contr,

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/alpheios-project/inflection-tables/issues/289?email_source=notifications&email_token=AJ32UOJJIF5QKGMB2SBD2YLQVYVNNA5CNFSM4JKOEXYKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEFIWNQY#issuecomment-558982851, or unsubscribe https://github.com/notifications/unsubscribe-auth/AJ32UONB4AEXOH3QOZXSV33QVYVNNANCNFSM4JKOEXYA .

irina060981 commented 4 years ago

yes, I got that. but, I didn't get how the numbers 1,2,3,4,5 are assigned to each stem type/table and how the matchOrder is then applied.

I am not sure if I understood your question correctly, Monica. I will try to explain the workflow in details - hope there would be the answer too :)

On the input we have inflections. For each inflection we go through all the rules from rule.csv. Each rule has already matchOrder value.

For example ἐλῴη

We compare all features from the rule with features from inflection and found what rules are suitable: For the word (1 inflection) we found that verbpdgm3 and verbpdgm4 Future System (without contraction) (verbpdgm3) Future System (Active and Middle) with contraction in -έω (verbpdgm4)

both has part of speech - verb stemtype - aw_fut and the inflection has (from morph adapter) part of speech - verb stemtype - aw_fut

At the next step we compare and find the greatest matchOrder (from the rules.csv) here is the source for them

ID ref,Match order,Part of speech,Stem type,Voice,Mood,Tense,Lemma,Morph flags,Dialect verbpdgm3,1,verb,aw_fut,,,,,, verbpdgm4,2,verb,aw_fut,,,,,contr,

as Bridget mentioned we don't use in analysis morph flags - verbpdgm3 - has no flag verbpdgm4 - has contr

so we choose verbpdgm4 (matchOrder = 2) finally we have for the first inflection Future System (Active and Middle) with contraction in -έω (verbpdgm4)

monzug commented 4 years ago

Irina, we should have a conversation on the side so I walk through what I don't understand (the matchOrder value in rule.csv). for ἐλῴη we pull out the wrong table. it should be the verbpdgm5. see attachment

monzug commented 4 years ago

Screen Shot 2019-11-27 at 4 13 54 PM

irina060981 commented 4 years ago

@monzug and @balmas , I have updated my tool http://alpheios-infl-check.irina-sklyarova.ru/

Now it could take a word and return ParadigmTables (orange - chosen by matchOrder)

image

monzug commented 4 years ago

Irina, tool looks good but the results for ἐλῴη are wrong for Inflection 1. let me explain it, step by step: 1) I use the morpheus tool at http://morph.alpheios.net/api/v1/analysis/word?word=%E1%BC%90%CE%BB%E1%BF%B4%CE%B7&lang=grc&engine=morpheusgrc it returns the following:

ἐλαύνω verb

ἐλ ῴη verb optative singular 3rd future active Attic aw_fut a_stem contr ἐλ ῴη verb optative singular 3rd present active epic aw_pr a_stem contr poetic rare

2) check the table at https://ucbclassics.dreamhosters.com/ancgreek/paradigmsU/paradigms_U.html and it's a verbpdgm5 see the screenshot that I added in my previous comment.

3) enter ἐλῴη in lookup and it returns the following

Screen Shot 2019-11-29 at 1 26 43 PM

all good here. But, when I click on inflection table, it shows the Future System (Active and Middle) with contraction in -έω which is the wrong table. it should be verbpdgm5, Future System (Active) with contraction in -άω

as I commented before, I believe rules.csv should be updated (see my comment above). still the problem is why the verbpdgm3 and verbpdgm4 tables are consider for matching and not the verbpdgm5? could it be because of the dialet? verbpdgm5 is the only that has dialet in rules.csv verbpdgm5 3 verb aw_fut contr doric verbpdgm5 3 verb aw_fut contr aeolic

does it make sense now?

irina060981 commented 4 years ago

Or I understood the question. The difference here is the only one

verbpdgm5,3,verb,aw_fut,,,,,contr,doric
verbpdgm5,3,verb,aw_fut,,,,,contr,aeolic

verbpdgm5 (Future System (Active) with contraction in -άω) Has an obligatory match - dialect - doric or aeolic But it doesn't see this dialect in the recieved homonym for ἐλῴη.

I will check why

irina060981 commented 4 years ago

Ok I found the problem - it is not in the rule, it is in the way we compare dialect feature for the example for the word ἐλᾶ

verbpdgm5 has dialect value - doric or aeolic inflections have - Attic Doric Aeolic, 'epic Doric Aeolic` so when the compare workflow checks if they are equal - they are not equal.

irina060981 commented 4 years ago

I have updated my code for the dialect comparision and it works well for ἐλᾶ image

But as for ἐλῴη it doesn't have a match for verbpdgm5, because it has the following dialect comparision:

ἐλῴη homonym inflections have the following dialect values - Attic and epic but verbpdgm5 has dialect value - doric or aeolic

So there are no matches here.

@monzug , what do you think, where is the error here - in rules or in morph results?

@balmas, should I create a PR for dialect comparision fix (as described in my previous comment) ?

monzug commented 4 years ago

Irina, glad we are on the same page now. definitely dialet is the issue for ἐλῴη . next we have to look at the rules.csv and update the mess with verbpdgm3 and verbpdgm4. I need more time for this. your question about the dialets. there are so many different flavor of dialets in Greek language that I would just ignore them. as said preiously I would remove the dialets in rules.csv

balmas commented 4 years ago

on the topic of the dialects: (1) if we are going to include dialects in the rules, then I think we do need to fix the check so that it handles the multi-valued feature properly (2) but if we decide to remove dialect from the rules, then it is a moot point

I am sure we had a reason at the time we made the rules to include the doric/aeolic dialect as a criteria, although it could well have been a hack and not necessarily valid reasoning.

We have the following rules for the aw_fut stemtype:

verbpdgm3,1,verb,aw_fut,,,,,, verbpdgm4,2,verb,aw_fut,,,,,contr, verbpdgm5,3,verb,aw_fut,,,,,contr,doric verbpdgm5,3,verb,aw_fut,,,,,contr,aeolic

I interpret these to mean that we had found certain words with stemtype aw_fut and a morph flag of contr that we wanted to match into table 5 (Future System (Active) with contraction in -άω) rather than table 4 (Future System (Active and Middle) with contraction in -έω) and that the only distinguishing factor we could use from the morphology service output was the dialect.

I think before we throw out the dialect from the rules, we should probably fix the code to correctly use the morph flags, because I suspect that is at least one source of problems right now.

monzug commented 4 years ago

we have dialects all over the place in Greek verbs but verbpdgm5 is the only one who has the dialect in rules.csv and it does not work. So, I would give a try first by removing the dialect and see if it works. I am suspicious as it's the only place where we have it. for the verbpdgm3 and 4, I need sometimes to review it, I strongly believe there should NOT be a verbpdgm3,1,verb,aw_fut,,,,,, verbpdgm4,2,verb,aw_fut,,,,,contr,

I would do like this verbpdgm3 1 verb reg_fut verbpdgm4 2 verb ew_fut verbpdgm5 3 verb aw_fut

flag contr is not used anyway.

monzug commented 4 years ago

an other error with the verbpdgm3 and verbpadgm4. lookup for βουλεύσω there are 3 tables involved: Weak (1st) Aorist System Active Weak (1st) Aorist System Middle Future System (Active and Middle) with contraction in -έω the latest should be a voice active and reg_fut stemtype so it should open the Future System (without contraction)

balmas commented 4 years ago

I think we should probably separate this issue from the ever-growing list of problems with the current rules. Actually we probably need at least 4 issues:

1) create a tool for creating/editing and validating paradigm data files 2) add the greek noun, adjective and and pronoun paradigms (using the new tool) 3) add support for matching paradigm tables for greek nouns, adjectives and pronouns (because we have already ending tables for those forms, we may have some additional work to do here to be able to display both, or prefer paradigm tables when the exist. this might also ideally result in a plugin-module for paradigm matching which can be used with both the editing tool and the runtime code) 4) one or more issues for all of the current verb paradigm table misses/incorrect matches.

This issue is the 2nd, I have created issues for the 1st and 3rd. @monzug can you create one or more issues for the misses/incorrect matches?

balmas commented 4 years ago

we have dialects all over the place in Greek verbs but verbpdgm5 is the only one who has the dialect in rules.csv and it does not work. So, I would give a try first by removing the dialect and see if it works. I am suspicious as it's the only place where we have it. for the verbpdgm3 and 4, I need sometimes to review it, I strongly believe there should NOT be a verbpdgm3,1,verb,aw_fut,,,,,, verbpdgm4,2,verb,aw_fut,,,,,contr,

I would do like this verbpdgm3 1 verb reg_fut verbpdgm4 2 verb ew_fut verbpdgm5 3 verb aw_fut

flag contr is not used anyway.

ok, @irina060981 let's hold off on the PR for the dialect change then until we have a full list of the issues.

irina060981 commented 4 years ago

may be we should do it too (will point it here) https://github.com/alpheios-project/components/issues/271

balmas commented 4 years ago

may be we should do it too (will point it here) alpheios-project/components#271

yes agree

irina060981 commented 4 years ago

Step 1 - The first bunch of tests are created - done

irina060981 commented 4 years ago

Step 2 - Create a separate GreekParadigmDataset, place all paradigm staff to a separate folder - done

irina060981 commented 4 years ago

Step 3 - Move Paradigm Full Match check to the Inflections repo - done

irina060981 commented 4 years ago

Step 4 - Add Greek Noun Paradigms

irina060981 commented 4 years ago

While adding noun paradigms I faced with a small problem with Nouns with Contraction: O-Declension (nounpdgm15)

I have checked all the words from the second column and found that most of them have stemtype = oos_oon lemma = περίπλους

but all of them are adjectives and have declension = 1st & 2nd

So I have no examples for this column I have put for now and have examples for the first and the third rows

nounpdgm15,1,noun,oos_oou,2nd,,νόος,,
nounpdgm15,1,noun,oos_oon,2nd,,περίπλους,,
nounpdgm15,1,noun,eos_eou,2nd,,κάνεον,,

@monzug and @balmas, could you help me with this table - the second column?

monzug commented 4 years ago

I found ὀστέον stem_type eos_eou + 3 others but I haven't lookup for them yet in my Smith περίπλους is listed in the irregularities let me check the online Smyth