Open nschneid opened 11 months ago
Should be IN/ADP+VBN/VERB+IN/ADP, since POS is tokenwise (so breaks down the same as "like contrasted with" IMO). Will fix GUM.
Since "as" and "like" can in general be either ADP or SCONJ, could you clarify why you think ADP is better here?
And also, should it depend on whether the fixed expression is functioning as case
vs. mark
?
Same as all English prepositions, ADP+case for adnominal, SCONJ+mark for clausal, no?
(after some offline discussion with @amir-zeldes) To be clear, the issue is the word-level UPOS of "as" given that we mark the whole thing as fixed
, which can function as a whole either as case
or mark
. Plain "as" can also function as case
/ADP or mark
/SCONJ. The PTB tagset which we use for XPOS doesn't distinguish these (IN = ADP+SCONJ) so it is a question of how the context should be taken into account for the first word of a fixed expression.
@dan-zeman any thoughts?
If a node has a fixed
dependent, it means that the node's UPOS does not (necessarily) reflect the word's position in the sentence. The UPOS that would correspond to the fixed expression as a whole may be different and it is not annotated in UD (except for optional MWEPOS
or ExtPOS
in MISC). The validator knows about this anomaly and skips most UPOS-incoming relation compatibility tests if it sees fixed
among the outgoing relations. So I think you should not modify the UPOS of the first node of a fixed expression based on its DEPREL.
Independently of the above, I also think that a word that is prototypically an adposition can keep the ADP
tag even if it occurs as a mark
dependent. The validator should digest the opposite situation, too: as is perhaps prototypically SCONJ
(at least for me) but it should be possible to attach it as case
if needed. So I would probably choose only one UPOS category for as even outside fixed expressions.
So, there's a good reason that for English, PTB merges prepositions and subordinators under one tag, IN
: there is heavy lexical overlap between the more traditional ADP
and SCONJ
categories. Thus far we have been choosing UPOS based on context. We could go in a different direction, for example, with the goal of minimizing UPOS ambiguity per word, and allowing ADP
/mark
(perhaps also SCONJ
/case
). Not sure this is a high priority though.
If in general we resolve UPOS based on context, it leaves the UPOS of the first word of fixed
underspecified. We could just default to ADP
for words that can be prepositions.
This is documented as fixed: https://universaldependencies.org/en/dep/fixed.html
What should be the UPOS of "as"? The data are inconsistent between ADP, ADV, SCONJ: