Open phoenix-mossimo opened 7 years ago
Are you sure all of those are bound groups (ex. §3) ? Sure they are when used as nouns but as verbs? I tried "norm_group=/.ϫⲓⲛϭⲟⲛⲥ./" vs. lemma="ϭⲟⲛⲥ" and many "ϫⲓ ⲛϭⲟⲛⲥ" are not found, also because of inconsistent encoding. Let me check the conditions first.
No, you're right, they are not necessarily bound groups; the solution to search in bound groups is really just a 'band aid' - it may work sometimes and is better than nothing, but is not an absolute solution. The real solution is to specify the sequence of norms in oRef - that's what @mjabrams is working on. Then it won't matter if they're in the same bound group, because ANNIS will search for a sequence of norms regardless of bound group borders.
By "specifying the sequence of norms in oRef" do you mean a) extending the XML files (for all compounds ) or b) doing it on-the-fly why generating the query for a given compound?
In a) we would be very much interested, that is a part of the plan actually.
I'm not sure I understand the difference, but I think a) . Isn't this already what the oRef tags do?
"oRef" was applied to "multiwords" only, type of compounds defined in #27, but not to all compounds.
I see... Yes, it would be better to either apply it to all complex entries, or have another similar tag for non "multiword" compounds. Basically, if there is some clear way for us to figure out what to search for in ANNIS, that would be best.
But in the meantime, if something doesn't have oRef, but does contain spaces, and is in the same bound group, the fallback of norm_group=/.XYZ./ will catch it, so we will have fairly good coverage already (this is not a permanent solution, but not terrible for now IMO)
What about the state of this issue? (Milestone 2.1.0?)
Yes. We need to manually tag the parts of all compounds within the
That sounds good, we could then update our mwe tagger to be based on the new list of oRef elements
Dictionary contains a substaintial number of compounds which are written separately. Seaching for these in Annis (button "Search in Annis") delivers nothing. I quess such cases need a special treatment by a search query script.
The issue is also relevant to "multiwords", which, although written together, contain lexical items and have additional tagging in dictionary. For the list of multiword types see #27.
Types of compounds, which are written separately:
1) Verb (st. abs.) + ⲛ/ⲙ/ⲉ/ⲉⲛ + article + noun (non-possessed): e.g.
ϯ ⲙⲡⲓⲙⲱⲓⲧ “give way” ϯ ⲛⲟⲩϭⲓⲙϣⲓϣ “take vengeance” ϯ ⲉⲡⲟⲩϣⲁⲡ “lend” ϩⲉ ⲉⲡⲟⲩⲟⲉⲓϣ “find time” ϯ ⲉⲡⲥⲱⲧⲉ “pay ransom” etc.
2) Verb (st. abs.) + ⲛ/ⲙ/ⲉ/ⲉⲛ + Ø + noun (non-possessed): e.g.
ϯ ⲛⲉⲩⲱ “give as pledge” ϫⲓ ⲛϭⲟⲛⲥ “use violence, do evil” ⲉⲣ ⲛⲁⲧⲑⲱⲧ ⲛϩⲏⲧ“disagree” ϫⲓ ⲉⲃⲉⲕⲉ “hire” ϭⲓ ⲛⲥⲕⲉⲛϩⲟ “gut aussehen”
3) Verb (st. abs.) + preposition / adverb: e.g.
ϯ ⲉϩⲟⲩⲛ (ⲉϩⲣⲛ-) “oppose” ϥⲓ ⲙⲛ “agree with” ⲟⲩⲱⲧⲉⲃ ⲥⲁⲃⲟⲗ “step over” ⲛⲟⲩ ϩⲛⲧⲟⲩⲱ “sit (to eat)”
4) Verb (st. abs.) + Ø + Ø + noun (non-possessed): e.g.
ⲃⲱⲗ ϣⲧⲱⲣⲉ ⲉⲃⲟⲗ “dissolve a guarantee”
5) Verb (st. abs.) + Ø + possessive pronoun + noun (non-possessed): e.g.
ⲥⲓⲧⲉ ⲛⲉϥⲟⲩⲉⲗⲗⲉ „recitate one's poetry“
6) Verb (st. abs.) + ⲛ/ⲙ/ⲉ/ⲉⲛ + Ø + noun (possessed) (+ suffix): e.g.
ϭⲛⲟⲛ ⲛϫⲱ⸗ “obey” ϯ ⲛⲓⲁⲧ⸗ “observe” ⲱϩⲉ ⲉⲣⲁⲧ⸗ “stand on foot” ϯ ⲉⲣⲁⲧ⸗ “put on foot” ϯ ⲛⲧⲟⲟⲧ⸗ “give a hand, help”