alpheios-project / morpheus

6 stars 2 forks source link

τἄμπροσθεν #49

Open vgorman1 opened 3 years ago

vgorman1 commented 3 years ago

I have come across the contraction τἄμπροσθεν in a sentence. In Arethusa, it is correctly broken into τ- and ἄμπροσθεν, but there is not dictionary tie-in. ἄμπροσθεν should link to ἔμπροσθεν.

balmas commented 3 years ago

this one is an interesting challenge.

Morpheus it seems will happily parse τἄμπροσθεν as ἔμπροσθεν but it doesn't like ἄμπροσθεν.

The llt tokenizer used by the Perseids/Arethusa setup splits contractions because it makes sense for syntactic annotation.

@vgorman1 does ἄμπροσθεν have any meaning if it isn't used this way in a contraction with "τ" ? Or is it only a valid form of ἔμπροσθεν when used in a contraction with "τ" ? Are there other such contractions that morpheus does handle the split form properly?

vgorman1 commented 3 years ago

As far as I know, ἄμπροσθεν has no meaning, except in contraction with "τ". τὰ + word starting in short, unaspirated ἀ- may contract to τἀ-. Other examples? I am not certain, but I will keep an eye out and let you know.

balmas commented 3 years ago

ok, in this case, if you are using the treebank data with Alpheios, if you supply the lemma as ἔμπροσθεν in the treebank data, it will be read and used by Alpheios.