KELLIA / dictionary

The dictionary comprised of the Coptic lexicon created by the BBAW and interface by Coptic SCRIPTORIUM. Currently deployed at https://coptic-dictionary.org
28 stars 12 forks source link

Phrases in dictionary #37

Open ctschroeder opened 6 years ago

ctschroeder commented 6 years ago

I'm not sure what to do with this, so I'm making an issue. ⲕⲱⲕ ⲁϩⲏⲩ (also ⲕⲁϩⲏⲩ) appears in the dictionary as a phrase, as its own entry. A couple of questions:

amir-zeldes commented 6 years ago

The direction dictionary -> ANNIS is not so problematic and actually works at the moment, since the spaces get stripped and the search is generated in AQL as:

norm_group=‎/.ⲕⲱⲕⲁϩⲏⲩ.‎/ https://corpling.uis.georgetown.edu/annis/scriptorium#_q=bm9ybV9ncm91cD0vLirispXisrHispXisoHPqeKyj-KyqS4qLw&_c=YmVzYS5sZXR0ZXJzLHNoZW5vdXRlLmEyMixqb2hhbm5lcy5jYW5vbnMsc2hlbm91dGUuZWFnZXJuZXNzLHNoZW5vdXRlLmRpcnQsc2FoaWRpYy5vdCxzaGVub3V0ZS5hYnJhaGFtLm91ci5mYXRoZXIsYXBvcGh0aGVnbWF0YS5wYXRydW0sc2FoaWRpY2EubnQsc2FoaWRpY2EuMWNvcmludGhpYW5zLHBzZXVkby50aGVvcGhpbHVzLHNoZW5vdXRlLmZveCxzYWhpZGljYS5tYXJrLGRvYy5wYXB5cmksbWFydHlyZG9tLnZpY3Rvcg&cl=5&cr=5&s=0&l=10&_bt=bm9ybV9ncm91cA

However, the search is on norm_group, not lemma, since this entry is not attested in the lemma table (and also, no frequencies appear as a result). I think this is a compound stative. The active lemma ⲕⲱⲕⲁϩⲏⲩ is actually attested, so the lemma in corpora should be ⲕⲱⲕⲁϩⲏⲩ (it isn't at the moment).

This error aside, I think there are many inconsistencies between the dictionary and corpora, and it should be on our agenda to track these down via scripts and reconcile (maybe after completing the lexicon paper, since the due date is slowly approaching)

ctschroeder commented 6 years ago

The instance I'm annotating is this: ⲉ̄ⲃⲟⲗ ϫⲉⲁⲩⲕⲁⲁϥ ⲕⲁϩⲏⲩ ⲛ̄ⲛⲉϥϩⲟⲓ̈ⲧⲉ. (appears as a phrase twice in pseudo-Theophilus's On the Cross)

So I need an annotation that's not a compound. That's why I was commenting here about multi-word phrases as entries in the dictionary. ⲕⲱⲕ ⲁϩⲏⲩ (not bound) is an entry in the dictionary. That's not what I have here, though it's the closest thing.

I'm not really asking for annotation help though (since my example has two lemmas -- ⲕⲱ and ⲕⲁϩⲏⲩ separated by a pronoun). Rather this situation led me to notice something about the dictionary structure: there is at least one multi-word phrase in the dictionary, which makes linking to/from the dictionary difficult and also searching within the dictionary tricky.

For this entry, I would recommend separating out ⲁϩⲏⲩ/ⲕⲁϩⲏⲩ as a separate entry. I don't know what is happening with search, though, and why it's hard to find ⲕⲱⲕ ⲁϩⲏⲩ when searching for ⲁϩⲏⲩ. I imagine there are more multi-word entries than just this one.

phoenix-mossimo commented 6 years ago

Hi guys, sorry for the late reply.

ⲃⲁϣⲁϩⲏⲩ shows us that the source compound is ⲃⲱϣ ("loosen", "strip", st. nom. ⲃⲁϣ) + ⲁϩⲏⲩ (a rare adverb, probably originally "r-HAw"/ "r-Hw" with the meaning "completely", see e.g. Erichsen Glossar p. 294 r-Hw "zu viel"). In context this compound is translated as adjective, although KoptHWb 29 and CD 47a are not sure about this ( both have "?").

ⲕⲱⲕ ⲁϩⲏⲩ (ⲕⲱⲕ "to peel off" + adverb ⲁϩⲏⲩ) should have had grammaticalized the same way as above to ⲕⲁⲕⲁϩⲏⲩ (see, for example, ⲕⲁⲕⲃⲁⲗ "with bare eyelids" CD 101a), but was from the start erroneously segmented as ⲕⲱ ⲕⲁϩⲏⲩ, the stat. pron. form of which is ⲕⲁⲁ⸗ ⲕⲁϩⲏⲩ (Till, Grammar §277). Thus ⲕⲁⲁ⸗, which Carrie has in the text, is basically a result of this "folk etymology".

As ⲕⲱⲕ ⲁϩⲏⲩ is not completely grammaticalized (i.e. ⲕⲱⲕ preserves status absolutus) I think it is proper to write it compound separately. I can add a cross-reference to ⲁϩⲏⲩ, just like ⲃⲁϣⲁϩⲏⲩ has ( <ref target="#ⲁϩⲏⲩ">nackt</ref>). It will lead to nowhere (the word ⲁϩⲏⲩ is attested in these two compounds only !, see KoptHWb 18), but at least it will appear in "Entries related to ..." section of the search results for both ⲕⲱⲕ ⲁϩⲏⲩ and ⲃⲁϣⲁϩⲏⲩ.

ctschroeder commented 6 years ago

Ok thanks. No need to apologize. This is a weird case. I mostly wanted to be sure there wasn't a problem with compounds and related entries beyond this particular case. Thanks.

ctschroeder commented 6 years ago

(FYI Amir I am lemmatizing ⲕⲁϩⲏⲩ to ⲁϩⲏⲩ but ⲕⲁⲁ to ⲕⲱ not ⲕⲱⲕ). Many thanks to both of you. We can close once the cross-ref is added?

amir-zeldes commented 6 years ago

I'm not 100% sure about the lemmatization practice, since we also have etymologically 'correct' cases of ⲕⲱⲕⲁϩⲏⲩ and ⲕⲏⲕⲁϩⲏⲩ. Shouldn't we normalize ⲕⲁⲕⲁϩⲏⲩ ⲧⲟ ⲕⲏⲕⲁϩⲏⲩ and then lemmatize normally? I guess this is the difference between calling it a variant and a legitimate new word, distinct from ⲕⲏⲕⲁϩⲏⲩ. They mean exactly the same, right?

ctschroeder commented 6 years ago

(At least that’s my understanding. I may be misreading what your suggesting though. The object pronoun appears between the verb and adv. So it’s not one lemma in CS’s model. Right?)

amir-zeldes commented 6 years ago

In the ⲕⲏⲕ version there is no pronoun, that is if we interpret it as stative of ⲕⲱⲕ. If that's the 'preferred' version, we could normalized the whole thing away and treat the variant as a corruption at the orig level (similar to, say, ⲕⲟⲗⲗⲩϭⲉ for ⲕⲟⲗⲗⲏⲅⲓⲟⲛ). The main question for me is what will users look for: will they be upset to miss these cases when they search for ⲕⲏⲕⲁϩⲏⲩ, or will they be more upset when looking for norm ⲕⲁϩⲏⲩ, not realizing this type of form should be searched for as orig...

ctschroeder commented 6 years ago

I agree with your solution for ⲕⲏⲕⲁϩⲏⲩ, but that is not the instance that prompted my thread. See my earlier comment, the attestation is ϫⲉⲁⲩⲕⲁⲁϥ ⲕⲁϩⲏⲩ with a pronoun.

amir-zeldes commented 6 years ago

Oh, I'm sorry, my bad. I understand the construction now. Yeah, ⲕⲱ makes sense here, and yes, it must all be separate norms.