deep-philology / DeepVocabulary

vocabulary server (mostly for Perseus but also standalone)
https://gu658.us1.eldarioncloud.com
MIT License
3 stars 0 forks source link

extra words shown in word list for some passages #84

Closed jtauber closed 6 years ago

jtauber commented 6 years ago

From @gregorycrane:

https://lk353.eu1.eldarioncloud.com/reader/urn:cts:greekLit:tlg0011.tlg003.perseus-grc2:55/

the vocab does not line up -- i was searching for erxomai but I don't see it in this line.

jtauber commented 6 years ago

In GVT, you can see it here: https://gu658.us1.eldarioncloud.com/word-list/urn:cts:greekLit:tlg0011.tlg003.perseus-grc2:55/

The bag_of_words_01.txt file has

66:55|28758 25128 34616 25425 45271 67506 89943

but the morphology.txt file has

urn:cts:greekLit:tlg0011.tlg003.perseus-grc2|55|1|ἔνθʼ|dd--------/dr--------|ἔνθα/ἔνθα
urn:cts:greekLit:tlg0011.tlg003.perseus-grc2|55|2|εἰσπεσὼν|v--sapamn-|εἰσπίπτω
urn:cts:greekLit:tlg0011.tlg003.perseus-grc2|55|3|ἔκειρε|v-3siia---|κείρω
urn:cts:greekLit:tlg0011.tlg003.perseus-grc2|55|4|πολύκερων|a--s---ma-|πολύκερως
urn:cts:greekLit:tlg0011.tlg003.perseus-grc2|55|5|φόνον|n--s---ma-|φόνος

Note yet sure if this is just #69 but it doesn't seem so.

Will need some more investigation as to what happened between morphology.txt and bag_of_words_01.txt.

jtauber commented 6 years ago

Okay, got to the bottom of this.

Morpheus gives this:

http://www.perseus.tufts.edu/hopper/morph?l=e%29%2Fnq&la=greek

and so Logeion has this

209814|ἔνθʼ|dd--------|ἔνθα|||1|1
209815|ἔνθʼ|v-2sama---|ἔρχομαι|||1|
209816|ἔνθʼ|v-3saia---|ἔρχομαι|||1|
209817|ἔνθʼ|vc3ppia---|εἰμί|||1|
209818|ἔνθʼ|vc3spia---|εἰμί|||1|

and my import code is using every possible lemmatisation, not JUST those suggested by Giuseppe.

I could change my code to not add extra lemmatisations from Logeion / Morpheus. That might be worth doing for now although it may mean missing lemmatisation possibilities. I think that will be covered by the morphology tool, rather than the vocabulary tool, though.

jtauber commented 6 years ago

Turned out I just had a bug in my code and was too aggressively falling back to extra lemmatisations (I meant to only use them if Giuseppe didn't provide any). Fixed now.