Open ctschroeder opened 6 years ago
The direction dictionary -> ANNIS is not so problematic and actually works at the moment, since the spaces get stripped and the search is generated in AQL as:
However, the search is on norm_group, not lemma, since this entry is not attested in the lemma table (and also, no frequencies appear as a result). I think this is a compound stative. The active lemma ⲕⲱⲕⲁϩⲏⲩ is actually attested, so the lemma in corpora should be ⲕⲱⲕⲁϩⲏⲩ (it isn't at the moment).
This error aside, I think there are many inconsistencies between the dictionary and corpora, and it should be on our agenda to track these down via scripts and reconcile (maybe after completing the lexicon paper, since the due date is slowly approaching)
The instance I'm annotating is this: ⲉ̄ⲃⲟⲗ ϫⲉⲁⲩⲕⲁⲁϥ ⲕⲁϩⲏⲩ ⲛ̄ⲛⲉϥϩⲟⲓ̈ⲧⲉ. (appears as a phrase twice in pseudo-Theophilus's On the Cross)
So I need an annotation that's not a compound. That's why I was commenting here about multi-word phrases as entries in the dictionary. ⲕⲱⲕ ⲁϩⲏⲩ (not bound) is an entry in the dictionary. That's not what I have here, though it's the closest thing.
I'm not really asking for annotation help though (since my example has two lemmas -- ⲕⲱ and ⲕⲁϩⲏⲩ separated by a pronoun). Rather this situation led me to notice something about the dictionary structure: there is at least one multi-word phrase in the dictionary, which makes linking to/from the dictionary difficult and also searching within the dictionary tricky.
For this entry, I would recommend separating out ⲁϩⲏⲩ/ⲕⲁϩⲏⲩ as a separate entry. I don't know what is happening with search, though, and why it's hard to find ⲕⲱⲕ ⲁϩⲏⲩ when searching for ⲁϩⲏⲩ. I imagine there are more multi-word entries than just this one.
Hi guys, sorry for the late reply.
ⲃⲁϣⲁϩⲏⲩ shows us that the source compound is ⲃⲱϣ ("loosen", "strip", st. nom. ⲃⲁϣ) + ⲁϩⲏⲩ (a rare adverb, probably originally "r-HAw"/ "r-Hw" with the meaning "completely", see e.g. Erichsen Glossar p. 294 r-Hw "zu viel"). In context this compound is translated as adjective, although KoptHWb 29 and CD 47a are not sure about this ( both have "?").
ⲕⲱⲕ ⲁϩⲏⲩ (ⲕⲱⲕ "to peel off" + adverb ⲁϩⲏⲩ) should have had grammaticalized the same way as above to ⲕⲁⲕⲁϩⲏⲩ (see, for example, ⲕⲁⲕⲃⲁⲗ "with bare eyelids" CD 101a), but was from the start erroneously segmented as ⲕⲱ ⲕⲁϩⲏⲩ, the stat. pron. form of which is ⲕⲁⲁ⸗ ⲕⲁϩⲏⲩ (Till, Grammar §277). Thus ⲕⲁⲁ⸗, which Carrie has in the text, is basically a result of this "folk etymology".
As ⲕⲱⲕ ⲁϩⲏⲩ is not completely grammaticalized (i.e. ⲕⲱⲕ preserves status absolutus) I think it is proper to write it compound separately. I can add a cross-reference to ⲁϩⲏⲩ, just like ⲃⲁϣⲁϩⲏⲩ has ( <ref target="#ⲁϩⲏⲩ">nackt</ref>
). It will lead to nowhere (the word ⲁϩⲏⲩ is attested in these two compounds only !, see KoptHWb 18), but at least it will appear in "Entries related to ..." section of the search results for both ⲕⲱⲕ ⲁϩⲏⲩ and ⲃⲁϣⲁϩⲏⲩ.
Ok thanks. No need to apologize. This is a weird case. I mostly wanted to be sure there wasn't a problem with compounds and related entries beyond this particular case. Thanks.
(FYI Amir I am lemmatizing ⲕⲁϩⲏⲩ to ⲁϩⲏⲩ but ⲕⲁⲁ to ⲕⲱ not ⲕⲱⲕ). Many thanks to both of you. We can close once the cross-ref is added?
I'm not 100% sure about the lemmatization practice, since we also have etymologically 'correct' cases of ⲕⲱⲕⲁϩⲏⲩ and ⲕⲏⲕⲁϩⲏⲩ. Shouldn't we normalize ⲕⲁⲕⲁϩⲏⲩ ⲧⲟ ⲕⲏⲕⲁϩⲏⲩ and then lemmatize normally? I guess this is the difference between calling it a variant and a legitimate new word, distinct from ⲕⲏⲕⲁϩⲏⲩ. They mean exactly the same, right?
(At least that’s my understanding. I may be misreading what your suggesting though. The object pronoun appears between the verb and adv. So it’s not one lemma in CS’s model. Right?)
In the ⲕⲏⲕ version there is no pronoun, that is if we interpret it as stative of ⲕⲱⲕ. If that's the 'preferred' version, we could normalized the whole thing away and treat the variant as a corruption at the orig level (similar to, say, ⲕⲟⲗⲗⲩϭⲉ for ⲕⲟⲗⲗⲏⲅⲓⲟⲛ). The main question for me is what will users look for: will they be upset to miss these cases when they search for ⲕⲏⲕⲁϩⲏⲩ, or will they be more upset when looking for norm ⲕⲁϩⲏⲩ, not realizing this type of form should be searched for as orig...
I agree with your solution for ⲕⲏⲕⲁϩⲏⲩ, but that is not the instance that prompted my thread. See my earlier comment, the attestation is ϫⲉⲁⲩⲕⲁⲁϥ ⲕⲁϩⲏⲩ with a pronoun.
Oh, I'm sorry, my bad. I understand the construction now. Yeah, ⲕⲱ makes sense here, and yes, it must all be separate norms.
I'm not sure what to do with this, so I'm making an issue. ⲕⲱⲕ ⲁϩⲏⲩ (also ⲕⲁϩⲏⲩ) appears in the dictionary as a phrase, as its own entry. A couple of questions: