Closed balmas closed 4 years ago
@balmas, I thought about this task and I came to the following:
We need to create the input tool that what be the most the same as in real life as it is possible (because it should be easy checked for correct data) According to paradigms (https://ucbclassics.dreamhosters.com/ancgreek/paradigmsU/paradigms_U.html) they are tables that could have rows/columns with labels and rows/columns with data.
So it would be useful
or to input it other way and see it rendered near input data.
Input data consists of the following data types:
So I see the following variant:
@balmas , what do you think?
I think I need more time to look through our code and paradigm tables variants. because for example
we have 3 values inside the dual (I believe) 'n.a.v` = nominative + accusative + vocative
I need to remember/understand how it should be placed to paradigm and render it after.
oh, it seems to me that we don't have similiar examples in verb paradigm
I think I need more time to look through our code and paradigm tables variants. because for example
we have 3 values inside the dual (I believe) 'n.a.v` = nominative + accusative + vocative
I need to remember/understand how it should be placed to paradigm and render it after.
we don't necessarily need to render it exactly as the original table. I.e. we could expand this out to separate rows for each case.
It would be very interesting if we could add to this design an integration with the lexical query and inflection tables library that allows us to test the matching rules as we are creating or editing a table.
Right now it is very very difficult to figure out why a particular form does or doesn't produce a table and a match in that table. If we could expose the decision points at each step in a kind of debugging interface that would help us to see where we need to improve or change the logic.
In some ways this is a variation on the inflection games that we experimented with last year.
I don't have a complete idea in my head of all the requirements for this but maybe @monzug has some thoughts?
uhm, I remembered working on matching the verbs paradigms. will be happy to take a look. not sure what I need behind the scene yet.
ideally, I'd like to design a general purpose inflection table entry/test tool that worked with all of the types of tables we support (endings, full forms, paradigms)
@balmas, I started to work on the prototype and created a simple parsing of existed paradigms tables (based on the vue-cli-4) - I published on my server for now
http://alpheios-infl-check.irina-sklyarova.ru/
And I tried to check it easily - this way for example Future System (without contraction) (verb)
I took a word from the table
βουλεύσοιτο
and searched for it using texts.alpheios.net interface and I have got another table
I will need more time to remember, how it is chosen for the targetWord by the code. And I have a question - do we need such a check on the tool?
If I understand the question correctly, yes I would like to be able to test the table and cell matching logic.
It is a complex process that needs to be made clearer and easier to debug.
ok - I will remember (examine) how it matches and will add a simple ui - for checking match
Looking at the Future System (without contraction) table, Irina chose a good example of wrong matching. in morpheus, some of the verbs from the table have stem type
ok, finally I have found the the way it chooses the paradigm table for βουλεύσοιτο
at the first checking step it found that there are two paradigms fit to inflections
by part of speech - verb and stemtype - reg_fut verbpgdm3 Future System (without contraction) (verb) verbpgdm4 Future System (Active and Middle) with contraction in -έω (verb)
at the second checking step it chooses only one paradigm with the highest matchOrder
verbpgdm3 has matchOrder = 1 verbpgdm4 has matchOrder = 3
that's why finally we have verbpgdm4 Future System (Active and Middle) with contraction in -έω (verb)
@monzug , what step could be wrong here then? @balmas, what do you think?
so, the second checking is overriding the first one based on match order. where is the matchOrder coming from? I mean what's the logic of matchOrder
On Mon, Nov 25, 2019 at 1:38 PM Sklyarova Irina notifications@github.com wrote:
ok, finally I have found the the way it chooses the paradigm table for βουλεύσοιτο
at the first checking step it found that there are two paradigms fit to inflections
by part of speech - verb and stemtype - reg_fut verbpgdm3 Future System (without contraction) (verb) verbpgdm4 Future System (Active and Middle) with contraction in -έω (verb)
at the second checking step it chooses only one paradigm with the highest matchOrder
verbpgdm3 has matchOrder = 1 verbpgdm4 has matchOrder = 3
that's why finally we have verbpgdm4 Future System (Active and Middle) with contraction in -έω (verb)
@monzug https://github.com/monzug , what step could be wrong here then? @balmas https://github.com/balmas, what do you think?
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/alpheios-project/inflection-tables/issues/289?email_source=notifications&email_token=AJ32UOMXV6C4HCYSV43XIE3QVPBMRA5CNFSM4JKOEXYKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEFCH7PY#issuecomment-558137279, or unsubscribe https://github.com/notifications/unsubscribe-auth/AJ32UONO5W5WCVGKEFJTFKLQVPBMRANCNFSM4JKOEXYA .
MatchOrder is the property for each rule for paradigm tables we have the following rules.csv file (the part for the following paradigms) the second value is MatchOrder
ID ref,Match order,Part of speech,Stem type,Voice,Mood,Tense,Lemma,Morph flags,Dialect
verbpdgm3,1,verb,reg_fut,,,,,,
verbpdgm4,3,verb,reg_fut,,,,,contr,
verbpdgm3,1,verb,aw_fut,,,,,,
verbpdgm4,2,verb,aw_fut,,,,,contr,
verbpdgm3,1,verb,ew_fut,,,,,,
verbpdgm4,2,verb,ew_fut,,,,,contr,
I'm not sure if this has anything to do with why we are getting the wrong table here, but in looking at this this morning, I realized that although we have the morphflags rule (the contr in the above example) I'm not sure if we ever actually use it or not. I think we might need to work backwards from the rules and the examples in the paradigm tables to see if we get the right results. I suspect we are not using it properly right now because I think these would be in the 'morph' feature of the inflection but the rule has it not as a feature but as a separate field (that is, when we parse the rules data set voice, mood, etc. get put into the features array for the rule but the morphflags get put into a separate field, and when we compare the features of the inflection we never compare the 'morph' feature that is in the inflection against the morphFlags property of the rule). Unfortunately I don't know right now
'ἀποθανοίμην' is an example of a greek word that has the value 'contr' in its morph field in the morphology service output that could be used to test that.
In the example above, though, βουλεύσοιτο, the morphology service does not tell us anything in the morph field but it does have euw
in the derivtype
field. We don't currently use the derivtype
feature in the paradigm matching rules but perhaps we should be.
( Now if all that doesn't give you a headache on a monday I don't know what will :-) )
I don't think we are using the contr either.
I think stem type is the element we should look at mainly. see βουλεύσοιτο vs ἀποθανοῖτο 1) stem type reg_fut derivtype euw does not have morph field
2) stem type ew_fut morph contr does not have derivtype field
look at one match for ἐλῴη
we show the Future System (Active and Middle) with contraction in -έω
stemtype ew_fut
derivtype allw
instead of the Future System (Active) with contraction in -άω stem type aw_fut derivtype a_stem morph contr
There are 3 matches for the following word: ἐλῴη
Future System (without contraction) (verbpdgm3) part of speech - verb stemtype - aw_fut matchOrder - 1
Future System (Active and Middle) with contraction in -έω (verbpdgm4) part of speech - verb stemtype - aw_fut matchOrder - 2
Present System Active of Contract Verbs in -άω (verbpdgm22) part of speech - verb stemtype - aw_pr matchOrder - 1
The biggest matchOrder is for the Future System (Active and Middle) with contraction in -έω (verbpdgm4)
uhm, so it's the matchOrder that may be the culprit!
@monzug just want to be sure I understand. You are saying we should show verbpdgm3 and verbpdgm4 for ἐλῴη and not verbpdgm4 ?
If so, I think if we added use of the morph field we would get the right result. I think that might be the better solution than changing the matchOrder, because I think we may need that as a higher order for other verbs.
I think generally, the use of the matchOrder is flawed.
I am looking at it right now. βουλεύσω is stem type reg_fut (does not have the morph field) but we show the stem type ew_fut with contr which has matchOrder 2 (verbpdgm4 and not verbpdgm3 - I don't understand the verbpdgm numbers, though)
I did only look at one match for ἐλῴη (the future) and we show the wrong table. we need to show the Future System (Active) with contraction in -άω (verbpdgm5 and not the verbpdgm4 ) stem type aw_fut derivtype a_stem morph contr
maybe the wrong table is pulled out because of the dialet?????? in verbpdgm5, we have 2 dialets: doric or aeolic but this verb ἐλῴη has dial Attic epic Ionic
I am trying to understand why we have all these rules and then matches don't work
verbpdgm3 1 verb reg_fut
verbpdgm4 3 verb reg_fut contr
verbpdgm3 1 verb aw_fut
verbpdgm4 2 verb aw_fut contr
verbpdgm3 1 verb ew_fut
verbpdgm4 2 verb ew_fut contr
I would simplify by deleting some entries such as
verbpdgm4 3 verb reg_fut contr
verbpdgm3 1 verb aw_fut
verbpdgm4 2 verb aw_fut contr
verbpdgm3 1 verb ew_fut
unless I got all wrong...
I also would delete the dialet from rule.csv - we have only in verbpdgm5
We aren't taking into account the "contr" morphflag , which might be part of the problem. It's also possible things were working better at one point and we broke them.
@monzug maybe we should step back and come up with a list of verbs and the tables they should (and shouldn't) match?
agree.
Irina, could you please help me to understand the logic of matchOrder? I did try few matches that I thought they would have failed and they worked like a charm instead. need to understand how the number of matchOrder have been assigned and when is it applied in the code? Thanks
On Tue, Nov 26, 2019 at 3:04 PM Bridget Almas notifications@github.com wrote:
We aren't taking into account the "contr" morphflag , which might be part of the problem. It's also possible things were working better at one point and we broke them.
@monzug https://github.com/monzug maybe we should step back and come up with a list of verbs and the tables they should (and shouldn't) match?
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/alpheios-project/inflection-tables/issues/289?email_source=notifications&email_token=AJ32UONWYZRJU4336DBGSQ3QVUUHZA5CNFSM4JKOEXYKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEFGDRAA#issuecomment-558643328, or unsubscribe https://github.com/notifications/unsubscribe-auth/AJ32UOPOOV7Y2B5EZ5M54CTQVUUHZANCNFSM4JKOEXYA .
Monica, it has a quite simple logic. It asigned manually to each table.
It is a part of the rules.csv
And the second value (number) after paradigm's identificator is the matchOrder
ID ref,Match order,Part of speech,Stem type,Voice,Mood,Tense,Lemma,Morph flags,Dialect
verbpdgm3,1,verb,reg_fut,,,,,,
verbpdgm4,3,verb,reg_fut,,,,,contr,
verbpdgm3,1,verb,aw_fut,,,,,,
verbpdgm4,2,verb,aw_fut,,,,,contr,
verbpdgm3,1,verb,ew_fut,,,,,,
verbpdgm4,2,verb,ew_fut,,,,,contr,
So it parses the following way (first line)
paradigmID = verbpdgm3 matchOrder = 1 part of speech = verb stem type = reg_fut
other parameters (voice, mood, tense, lemma, morph flags, dialect) are not filled
yes, I got that. but, I didn't get how the numbers 1,2,3,4,5 are assigned to each stem type/table and how the matchOrder is then applied.
On Wed, Nov 27, 2019 at 9:27 AM Sklyarova Irina notifications@github.com wrote:
Monica, it has a quite simple logic. It asigned manually to each table. It is a part of the rules.csv And the second value (number) after paradigm's identificator is the matchOrder
ID ref,Match order,Part of speech,Stem type,Voice,Mood,Tense,Lemma,Morph flags,Dialect
verbpdgm3,1,verb,reg_fut,,,,,, verbpdgm4,3,verb,reg_fut,,,,,contr, verbpdgm3,1,verb,aw_fut,,,,,, verbpdgm4,2,verb,aw_fut,,,,,contr, verbpdgm3,1,verb,ew_fut,,,,,, verbpdgm4,2,verb,ew_fut,,,,,contr,
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/alpheios-project/inflection-tables/issues/289?email_source=notifications&email_token=AJ32UOJJIF5QKGMB2SBD2YLQVYVNNA5CNFSM4JKOEXYKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEFIWNQY#issuecomment-558982851, or unsubscribe https://github.com/notifications/unsubscribe-auth/AJ32UONB4AEXOH3QOZXSV33QVYVNNANCNFSM4JKOEXYA .
yes, I got that. but, I didn't get how the numbers 1,2,3,4,5 are assigned to each stem type/table and how the matchOrder is then applied.
I am not sure if I understood your question correctly, Monica. I will try to explain the workflow in details - hope there would be the answer too :)
On the input we have inflections.
For each inflection we go through all the rules from rule.csv
. Each rule has already matchOrder value.
For example ἐλῴη
We compare all features from the rule with features from inflection and found what rules are suitable: For the word (1 inflection) we found that verbpdgm3 and verbpdgm4 Future System (without contraction) (verbpdgm3) Future System (Active and Middle) with contraction in -έω (verbpdgm4)
both has part of speech - verb stemtype - aw_fut and the inflection has (from morph adapter) part of speech - verb stemtype - aw_fut
At the next step we compare and find the greatest matchOrder (from the rules.csv) here is the source for them
ID ref,Match order,Part of speech,Stem type,Voice,Mood,Tense,Lemma,Morph flags,Dialect verbpdgm3,1,verb,aw_fut,,,,,, verbpdgm4,2,verb,aw_fut,,,,,contr,
as Bridget mentioned we don't use in analysis morph flags - verbpdgm3 - has no flag verbpdgm4 - has contr
so we choose verbpdgm4 (matchOrder = 2) finally we have for the first inflection Future System (Active and Middle) with contraction in -έω (verbpdgm4)
Irina, we should have a conversation on the side so I walk through what I don't understand (the matchOrder value in rule.csv). for ἐλῴη we pull out the wrong table. it should be the verbpdgm5. see attachment
@monzug and @balmas , I have updated my tool http://alpheios-infl-check.irina-sklyarova.ru/
Now it could take a word and return ParadigmTables (orange - chosen by matchOrder)
Irina, tool looks good but the results for ἐλῴη are wrong for Inflection 1. let me explain it, step by step: 1) I use the morpheus tool at http://morph.alpheios.net/api/v1/analysis/word?word=%E1%BC%90%CE%BB%E1%BF%B4%CE%B7&lang=grc&engine=morpheusgrc it returns the following:
2) check the table at https://ucbclassics.dreamhosters.com/ancgreek/paradigmsU/paradigms_U.html and it's a verbpdgm5 see the screenshot that I added in my previous comment.
3) enter ἐλῴη in lookup and it returns the following
all good here. But, when I click on inflection table, it shows the Future System (Active and Middle) with contraction in -έω which is the wrong table. it should be verbpdgm5, Future System (Active) with contraction in -άω
as I commented before, I believe rules.csv should be updated (see my comment above). still the problem is why the verbpdgm3 and verbpdgm4 tables are consider for matching and not the verbpdgm5? could it be because of the dialet? verbpdgm5 is the only that has dialet in rules.csv verbpdgm5 3 verb aw_fut contr doric verbpdgm5 3 verb aw_fut contr aeolic
does it make sense now?
Or I understood the question. The difference here is the only one
verbpdgm5,3,verb,aw_fut,,,,,contr,doric
verbpdgm5,3,verb,aw_fut,,,,,contr,aeolic
verbpdgm5 (Future System (Active) with contraction in -άω)
Has an obligatory match - dialect - doric
or aeolic
But it doesn't see this dialect in the recieved homonym for ἐλῴη.
I will check why
Ok I found the problem - it is not in the rule, it is in the way we compare dialect feature for the example for the word ἐλᾶ
verbpdgm5 has dialect value - doric
or aeolic
inflections have - Attic Doric Aeolic
, 'epic Doric Aeolic`
so when the compare workflow checks if they are equal - they are not equal.
I have updated my code for the dialect comparision and it works well for ἐλᾶ
But as for ἐλῴη it doesn't have a match for verbpdgm5, because it has the following dialect comparision:
ἐλῴη homonym inflections have the following dialect values - Attic
and epic
but verbpdgm5 has dialect value - doric or aeolic
So there are no matches here.
@monzug , what do you think, where is the error here - in rules or in morph results?
@balmas, should I create a PR for dialect comparision fix (as described in my previous comment) ?
Irina, glad we are on the same page now. definitely dialet is the issue for ἐλῴη . next we have to look at the rules.csv and update the mess with verbpdgm3 and verbpdgm4. I need more time for this. your question about the dialets. there are so many different flavor of dialets in Greek language that I would just ignore them. as said preiously I would remove the dialets in rules.csv
on the topic of the dialects: (1) if we are going to include dialects in the rules, then I think we do need to fix the check so that it handles the multi-valued feature properly (2) but if we decide to remove dialect from the rules, then it is a moot point
I am sure we had a reason at the time we made the rules to include the doric/aeolic dialect as a criteria, although it could well have been a hack and not necessarily valid reasoning.
We have the following rules for the aw_fut stemtype:
verbpdgm3,1,verb,aw_fut,,,,,, verbpdgm4,2,verb,aw_fut,,,,,contr, verbpdgm5,3,verb,aw_fut,,,,,contr,doric verbpdgm5,3,verb,aw_fut,,,,,contr,aeolic
I interpret these to mean that we had found certain words with stemtype aw_fut and a morph flag of contr that we wanted to match into table 5 (Future System (Active) with contraction in -άω) rather than table 4 (Future System (Active and Middle) with contraction in -έω) and that the only distinguishing factor we could use from the morphology service output was the dialect.
I think before we throw out the dialect from the rules, we should probably fix the code to correctly use the morph flags, because I suspect that is at least one source of problems right now.
we have dialects all over the place in Greek verbs but verbpdgm5 is the only one who has the dialect in rules.csv and it does not work. So, I would give a try first by removing the dialect and see if it works. I am suspicious as it's the only place where we have it. for the verbpdgm3 and 4, I need sometimes to review it, I strongly believe there should NOT be a verbpdgm3,1,verb,aw_fut,,,,,, verbpdgm4,2,verb,aw_fut,,,,,contr,
I would do like this verbpdgm3 1 verb reg_fut verbpdgm4 2 verb ew_fut verbpdgm5 3 verb aw_fut
flag contr is not used anyway.
an other error with the verbpdgm3 and verbpadgm4. lookup for βουλεύσω there are 3 tables involved: Weak (1st) Aorist System Active Weak (1st) Aorist System Middle Future System (Active and Middle) with contraction in -έω the latest should be a voice active and reg_fut stemtype so it should open the Future System (without contraction)
I think we should probably separate this issue from the ever-growing list of problems with the current rules. Actually we probably need at least 4 issues:
1) create a tool for creating/editing and validating paradigm data files 2) add the greek noun, adjective and and pronoun paradigms (using the new tool) 3) add support for matching paradigm tables for greek nouns, adjectives and pronouns (because we have already ending tables for those forms, we may have some additional work to do here to be able to display both, or prefer paradigm tables when the exist. this might also ideally result in a plugin-module for paradigm matching which can be used with both the editing tool and the runtime code) 4) one or more issues for all of the current verb paradigm table misses/incorrect matches.
This issue is the 2nd, I have created issues for the 1st and 3rd. @monzug can you create one or more issues for the misses/incorrect matches?
we have dialects all over the place in Greek verbs but verbpdgm5 is the only one who has the dialect in rules.csv and it does not work. So, I would give a try first by removing the dialect and see if it works. I am suspicious as it's the only place where we have it. for the verbpdgm3 and 4, I need sometimes to review it, I strongly believe there should NOT be a verbpdgm3,1,verb,aw_fut,,,,,, verbpdgm4,2,verb,aw_fut,,,,,contr,
I would do like this verbpdgm3 1 verb reg_fut verbpdgm4 2 verb ew_fut verbpdgm5 3 verb aw_fut
flag contr is not used anyway.
ok, @irina060981 let's hold off on the PR for the dialect change then until we have a full list of the issues.
may be we should do it too (will point it here) https://github.com/alpheios-project/components/issues/271
may be we should do it too (will point it here) alpheios-project/components#271
yes agree
Step 1 - The first bunch of tests are created - done
Step 2 - Create a separate GreekParadigmDataset, place all paradigm staff to a separate folder - done
Step 3 - Move Paradigm Full Match check to the Inflections repo - done
Step 4 - Add Greek Noun Paradigms
While adding noun paradigms I faced with a small problem with Nouns with Contraction: O-Declension (nounpdgm15)
I have checked all the words from the second column and found
that most of them have
stemtype = oos_oon
lemma = περίπλους
but all of them are adjectives and have declension = 1st & 2nd
So I have no examples for this column I have put for now and have examples for the first and the third rows
nounpdgm15,1,noun,oos_oou,2nd,,νόος,,
nounpdgm15,1,noun,oos_oon,2nd,,περίπλους,,
nounpdgm15,1,noun,eos_eou,2nd,,κάνεον,,
@monzug and @balmas, could you help me with this table - the second column?
I found ὀστέον stem_type eos_eou + 3 others but I haven't lookup for them yet in my Smith περίπλους is listed in the irregularities let me check the online Smyth
We would like to add the remainder of the paradigm tables (nouns, adjectives, pronouns) from http://ucbclassics.dreamhosters.com/ancgreek/paradigmsU/paradigms_U.html to Alpheios.
@vgorman1 has offered to help with the data entry/data conversion process.
this will require some changes to the inflection table code and we need to think about the best way to get the data.