Open nevenjovanovic opened 7 years ago
Thank you, Neven. Very helpful, indeed.
The forms you propose to include in the DB are mostly "exceptional forms" (in LEMLAT's terminology) of already recorded lemmas. See the documentation for the details of such forms. Basically, such forms are not segmented by LEMLAT and their analysis is fully hard coded in a specific table of the db (called "forme_ecc").
I will check each of these forms in the lexicographic sources of LEMLAT (Georges, OLD, Laterculi + Onomasticon of Forcellini). If they are there, they will be included in the db (this might be the case of "nosse"). If not, we will have to take a decision about, as we want to separate in the "lessario" table those forms not reported by the sources of LEMLAT (there is a specific column for such information: src).
Thank you again!
Marco
Il giorno 21 ago 2017, alle ore 11:49, Neven Jovanović notifications@github.com ha scritto:
We have tested LEMLAT on a reading list classical Latin corpus of some 23,700 words and 8,538 different word forms: Terence's Adelphoe, Horace's Odes Bk. 1, Tibullus Bk. 1, Seneca's Letters Bk. 1 (all editions from the PerseusDL collection). Beside various forms of personal names (and some typos in our sources), there were 40 word forms not recognized by LEMLAT; a tiny percent of all forms -- but the list is below. Some reasons for not recognizing the forms seem to be orthographical (ë, omitted -p- in emta, demsi, oe in foeneraret; words joined instead of separated -- illiusmodi). Some have to do with meter in comedy - the elided -n', from -ne, is regularly not recognized by LEMLAT. Some missing forms are fairly common: norimus, nosse.
I propose that the forms from the list below be added to the LEMLAT database.
adteruisse audistin coëmisse demseris demsi egon emta emtae emtam foeneraret haecine hancine hocine hoscine illan illiusmodi ipsus lucu men norimus nosse nossem nostin numquidnam poëta poëtae posthaec propediem quamobrem quamprimum quandoquidem quorundam quotannis refrixerit sumtuosa tamdiu tantummodo tercentenas tetigin tun
— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub, or mute the thread.
Per apportare questa modifica, si tenga conto che si tratta di una modifica che impatta LEMLAT e solo potenzialmente anche il lemmario di LiLa.
La lista di Neven è una lista di forme non riconosciute da LEMLAT. Non sono lemmi e non impattano il lemmario LiLa, se non nei due casi qui sotto descritti. Se si vuole apportare una modifica per risolverle, bisogna apportarla in LEMLAT: e le tipologie di modifica sono molteplici, e.g. nuove forme eccezionali, nuovi les con codles flessivi, aggiunte di a_gra, etc.
Impatto sul lemmario di LiLa:
Flavio è la persona più giusta per apportare modifiche a LEMLAT, perché ha ben chiaro il quadro complessivo delle tabelle del lemlat_db. Assieme a lui deciderò un momento (auspicabilmente a fine emergenza COVID-19) dedicato a una campagna di: (a) aggiornamento di LEMLAT con i nuovi lemmi inseriti nel lila_db (identificati con un codice src che li trova in modo non ambiguo); (b) aggiornamento di LEMLAT con le modifiche necessarie per far fronte alla lista di Neven.
Ricordo che si inseriscono in LiLa nuovi lemmi/wr solo se si realizza una di queste condizioni:
We have tested LEMLAT on a corpus of classical Latin texts from a university reading list. The corpus contains some 23,700 words and 8,538 different word forms: Terence's Adelphoe, Horace's Odes Bk. 1, Tibullus Bk. 1, Seneca's Letters Bk. 1 (all editions from the PerseusDL collection). Beside various forms of personal names (and some typos in our sources), there were 40 word forms not recognized by LEMLAT; a tiny percent of all forms -- but the list is below. Some reasons for not recognizing the forms seem to be orthographical (ë, omitted -p- in emta, demsi, oe in foeneraret; words joined instead of separated -- illiusmodi). Some have to do with meter in comedy - the elided -n', from -ne, is regularly not recognized by LEMLAT. Some missing forms are fairly common: norimus, nosse.
I propose that the forms from the list below be added to the LEMLAT database.