PerseusDL / morpheus

Morpheus parser
26 stars 24 forks source link

Morpheus parsing llt-tokenized texts #7

Open LFDM opened 10 years ago

LFDM commented 10 years ago

People can request to markup enclicits (the Pisa guys who'd like to annotate some Seneca already did), in most cases this comes down to a hyphen, e.g. arma virumque becomes arma virum -que. I think this makes a lot of sense, especially with other cases, such as an enclitic ne, because without marking them as split up, there would be no chance to disambiguate an enclitic -ne from the 'real' ne (used for negations etc.)

However, Morpheus doesn't really know what to do with the hyphen - -c and -que remain unidentified entirely, -ne is said to be a form of neo1 etc.

gregorycrane commented 10 years ago

Morpheus needs to be fed individual tokens, so you need to pre-process things and feed it only "virum". It won't recognize the enclitics, so handle them as exceptions.

On 2/11/14, 9:11 AM, Gernot Höflechner wrote:

People can request to markup enclicits (the Pisa guys who'd like to annotate some Seneca already did), in most cases this comes down to a hyphen, e.g. |arma virumque| becomes |arma virum -que|. I think this makes a lot of sense, especially with other cases, such as an enclitic |ne|, because without marking them as split up, there would be no chance to disambiguate an enclitic |-ne| from the 'real' |ne| (used for negations etc.)

However, Morpheus doesn't really know what to do with the hyphen - |-c| and |-que| remain unidentified entirely, |-ne| is said to be a form of |neo1| etc.

— Reply to this email directly or view it on GitHub https://github.com/PerseusDL/morpheus/issues/7.