PietroLiuzzo commented 7 years ago

on a string input analyze and return possible morphological matches and lemmas

PietroLiuzzo commented 7 years ago

@vitagrazia81 has lists of possible constructions of words for names and verbs, which might need to be further developed

PietroLiuzzo commented 7 years ago

make distinction for suffix pronomina clear on analysis level

vitagrazia81 commented 7 years ago

Yes, it needs a further development and to be complete. For the moment the list does not cover all the nominal and verbal forms. For the nouns, there are only the ones for three and "normal" radicals and for the verbs it is still missing: the forms of verbs II and III W and of verbs with four and more radicals.

vitagrazia81 commented 7 years ago

No sorry: for the nouns we have also the forms with I, II, III laryngeal.

PietroLiuzzo commented 7 years ago

use case: the user sends a request with a string query parameter, the response offers a list of possible morphological matches of the pattern organized by root as in the greek word study tool http://www.perseus.tufts.edu/hopper/morph?l=gignomai&la=greek the string can be given both as fidal or transcription (#5) is then analyzed by the lexicon which returns its root and the pattern associated. the pattern is matched to find out the possible morphological definitions on the basis of the tables provided and the root components are matched to possible roots relevant for the morphological patterns matched to provide results.

PietroLiuzzo commented 7 years ago

Goal

retrive morphological informations on single tokens for a string request

Main actor(s)

all users human and applications

Short description

the user sends a request with a string query parameter in fidel or transcription (#5) , the response offers a list of possible morphological matches of the pattern organized by root One main user will be the dillmann app #3

Examples

the greek word study tool http://www.perseus.tufts.edu/hopper/morph?l=gignomai&la=greek

7 further specification

the string can be given both as fidal or transcription (#5).
the lexicon should look first for the root and store possible root matches. The root components are matched to possible roots relevant for the morphological patterns matched to provide results to filter out non useful matches. For each of the possible root should evaluate the possible patterns associated to find the one matching the query.

Preconditions

There is already the root tool done by @cvertan which can be used as a starting point for this. There are also forms already compiled with this root tool which could be used for the intelligence in the lexicon app. the lexicon needs to know schemas and patterns. tables for these will be provided by @vitagrazia81 the dillmann dictionary api should respond to a request for a lemma with the id of the lemma as it is already possible with a query like http://betamasaheft.aai.uni-hamburg.de/api/Dillmann/search/form?q=ቄጵርስስ which returns the id of the lemma

Example Basic flow

the lexicon receives a request which contains the string ሕዝብ. If the string is made of more words, the response should contain the following for each word.
the lexicon looks at it and detects schema and pattern matching them from the tables provided.
based on the pattern the lexicon looks at possible roots excluding those which are not relevant to the pattern
the lexicon returns a response in json containing
1. the initial query string
2. for each possible root: the root, the schema, the pattern, the meaning of that pattern in a string form, he meaning of that schema in a string form
3. for each possible root will return the link to a definition in the Dillmann app, by retriving the id from the Dillmann API.

use case: dillmann app

the dictionary app will use to provide for any searched string also the morphological matches by sending first the queried string to the lexicon and then offering to search in the dictionary for one or the other of the results. #7

Alicias Analizador

http://elvira.lllf.uam.es/jabalin/analizarForma.php

Alternate flow

no match is found for the entered string. the lexicon returns an error with the problem encountered (e.g. cannot parse word, schema not found, pattern not found)

Postconditions

A list of matching roots, schemas and patterns is returned.

PietroLiuzzo commented 7 years ago

where does the tokenization happens? rules for the tokenization

PietroLiuzzo commented 7 years ago

User story 1 (Magda)

1. Query:

ነገርክናኒ / nagarkǝnāni

2. Input processing:

Automatic transliteration it should figure out that this is a Perfective second person plural feminine with an object suffix pronoun in the first person singular it will look in the table of Perfective with Object Suffixes and find out that ni can be an object suffix Warnings: the table provided contains the possibilities not the patterns which will have to be elaborated in a table by the engine. This is a further requirement. a table with

nagarkǝnā - ni will have to be completed with a list of patterns like

1a2a304ǝ5ā6i where each couple represent a syllable and the first number in a couple is the consecutive numbering (starting with 1) and the second is the vowel in the schema/pattern

the input will be processed to this pattern as well and matched one to one. in most cases you will be lucky enough to have already the pattern possibility expressed. probably a list of unique patterns should also be compiled for quick lookup

3. Results:

will return the definition as above with a possible english translation

Perfective second person plural feminine with an object suffix pronoun in the first person singular. "They (F) told me"

matching the pattern of the Perfective in the verbs tables will return that this is a Perfective of the root and stem corresponding to the first three syllables, thus in this example will provide the following

root	stem
nagara	ነገር

then it will provide a split view of the term in transliteration as in the perfective with object suffixes table, thus

a. nagarkǝnā -ni (note the bold) and one with a view splitting all the elements, so also the subject pronoun

b. nagar - kǝnā - ni

where for each part there is a link to the information related to that pattern nagara - perfect = table of the verbs kǝnā - subject pronoun = table of subject pronouns (conjugation of the perfect) ni object pronoun = table of the object pronouns

4. User interaction:

the user can click on the root and go to the Dillmann dictionary entry. this can be done via API querying for the lemma id and then building a direct link to the entry.
the user can click and generate all other forms of the perfective with that subject pronoun but different object pronouns
the user can click and generate all other forms of the perfective with a different subject pronoun and all different object pronouns related
for each returned value the user will be able to see attestation of that exact form from the corpus, distinguished between those coming from BM and those coming from the annotated corpus. the search on BM can be done via search or kwicsearch API.

User story 2 (Magda)

1. Query:

ትነግራኒ/ tǝnaggǝrāni 1ǝ2a304ǝ5ā6i

2. Input processing:

Automatic transliteration it should figure out that this is the Imperfective second person plural feminine with an object suffix in the first person singular

3. Results:

will return the definition as above with a possible english translation

Imperfective second person plural feminine with an object suffix in the first person singular "They (F) will tell me" or "They (F) tell me"

root stem

nagara ነግር
then it will provide a split view of the term in transliteration as in the imperfective with object suffixes table, thus a. tǝnaggǝrā - ni

root	stem
nagara	ነግር

and one with a view splitting all the elements, so also the subject pronoun

b. tǝ- naggǝr- ā - ni

where for each part there is a link to the information related to that pattern naggǝr = imperfective = table of the verbs tǝ- + - ā = subject pronoun = table of subject pronouns (Imperfective in the Indicative and Jussive Moods) ni object pronoun = table of the object pronouns

4. User interaction:

the user can click on the root and go to the Dillmann dictionary entry. this can be done via API querying for the lemma id and then building a direct link to the entry.
the user can click and generate all other forms of the perfective with that subject pronoun but different object pronouns
the user can click and generate all other forms of the perfective with a different subject pronoun and all different object pronouns related
for each returned value the user will be able to see attestation of that exact form from the corpus, distinguished between those coming from BM and those coming from the annotated corpus. the search on BM can be done via search or kwicsearch API.

MagdaKrzyz commented 6 years ago

nagarkǝnā - ni Please note that the correct version is as follows: nagarkǝn-āni

MagdaKrzyz commented 6 years ago

Corrections done to the User Story 1: "it will look in the table of Perfective with Object Suffixes and find out that -āni can be an object suffix", nagara - perfect = table of the verbs kǝn - subject pronoun = table of subject pronouns (conjugation of the perfect) -āni object pronoun = table of the object pronouns

"a. nagarkǝn-āni (note the bold) b. nagar - kǝn-āni

PietroLiuzzo commented 6 years ago

a further tool to look at https://ethiopic-tool.firebaseapp.com/ by Garry Jost. A very intersting email exchange is available with instructions if needed the text once loaded

looks up the word in the database associated with the website
if it doesn't find it, strips off the prefixes ወ- በ- ለ- ዘ- እም- (and combinations of ወ- and በ- ለ- ዘ- እም-) and tries again
if it finds the word, then it has from the database the gloss, parse info, and lemma
then looks up the lemma in the database, for the info in the window to the right, or to look up in the Dillmann database

sdruskat commented 5 years ago

@MagdaKrzyz I'm having some trouble reproducing correct transliterations for User story 2 (ትነግራኒ/ tǝnaggǝrāni) in the testing phase with respect to ግ = ggǝ, which isn't easily computable from the Fidäl alone. The candidates I get for ትነግራኒ are [tnagrāni, tǝnagrāni, tnagǝrāni, tǝnagǝrāni], of which the last one seems the best candidate.

Do you have a systematic overview of transliterations rules somewhere?

PietroLiuzzo commented 5 years ago

As replied in the email it is not possible to produce the transliteration without having already knowledge of the pattern and having separated affixes.

TraCES-Lexicon / lexicon

get morphological matches and definitions #2

Goal

Main actor(s)

Short description

Examples

7

further specification

Preconditions

Example Basic flow

use case: dillmann app

Alicias Analizador

Alternate flow

Postconditions

User story 1 (Magda)

1. Query:

2. Input processing:

3. Results:

4. User interaction:

User story 2 (Magda)

1. Query:

2. Input processing:

3. Results:

4. User interaction: