Closed jtauber closed 6 years ago
In GVT, you can see it here: https://gu658.us1.eldarioncloud.com/word-list/urn:cts:greekLit:tlg0011.tlg003.perseus-grc2:55/
The bag_of_words_01.txt
file has
66:55|28758 25128 34616 25425 45271 67506 89943
but the morphology.txt
file has
urn:cts:greekLit:tlg0011.tlg003.perseus-grc2|55|1|ἔνθʼ|dd--------/dr--------|ἔνθα/ἔνθα
urn:cts:greekLit:tlg0011.tlg003.perseus-grc2|55|2|εἰσπεσὼν|v--sapamn-|εἰσπίπτω
urn:cts:greekLit:tlg0011.tlg003.perseus-grc2|55|3|ἔκειρε|v-3siia---|κείρω
urn:cts:greekLit:tlg0011.tlg003.perseus-grc2|55|4|πολύκερων|a--s---ma-|πολύκερως
urn:cts:greekLit:tlg0011.tlg003.perseus-grc2|55|5|φόνον|n--s---ma-|φόνος
Note yet sure if this is just #69 but it doesn't seem so.
Will need some more investigation as to what happened between morphology.txt
and bag_of_words_01.txt
.
Okay, got to the bottom of this.
Morpheus gives this:
http://www.perseus.tufts.edu/hopper/morph?l=e%29%2Fnq&la=greek
and so Logeion has this
209814|ἔνθʼ|dd--------|ἔνθα|||1|1
209815|ἔνθʼ|v-2sama---|ἔρχομαι|||1|
209816|ἔνθʼ|v-3saia---|ἔρχομαι|||1|
209817|ἔνθʼ|vc3ppia---|εἰμί|||1|
209818|ἔνθʼ|vc3spia---|εἰμί|||1|
and my import code is using every possible lemmatisation, not JUST those suggested by Giuseppe.
I could change my code to not add extra lemmatisations from Logeion / Morpheus. That might be worth doing for now although it may mean missing lemmatisation possibilities. I think that will be covered by the morphology tool, rather than the vocabulary tool, though.
Turned out I just had a bug in my code and was too aggressively falling back to extra lemmatisations (I meant to only use them if Giuseppe didn't provide any). Fixed now.
From @gregorycrane: