Closed nschneid closed 2 weeks ago
list item numbers should be NUM
This is definitely not right, because LS is also the tag for graphical bullets, which are in no way numbers. I'm also not sure that "A1.iii)" is a number, I'd say it's much more of an X
. I see some mention of using either PUNCT/punct or SYM/dep for these. In GUM xpos=LS is always attached as dep
, and nummod
is only used for counting things.
This is definitely not right, because LS is also the tag for graphical bullets, which are in no way numbers.
https://universaldependencies.org/u/pos/SYM.html says bullets are PUNCT. It seems to be distinguishing them from list item markers with a (quasi)numerical component (i.e., they reflect a position in a sequential ordering of some kind).
I could also imagine thinking of lists as a type of coordination, and these as helping to mark how a list item relates to other items in the list, so CCONJ. But that may be unpopular. :)
I'm not so convinced. I think syntactically there is no difference between numerical, graphical, alphabetical and mixed list item markers. It's all the same kind of orthographic device, and I would like them to have the same analysis. I wouldn't feel too bad about punct, but then we are not allowed to treat them as kinds of numbers morphologically, and in any case it would create an uncomfortable situation where punctuation becomes open ended.
Tagging them all as SYM, or even splitting them into SYM for non-numerical and NUM for numerical would be OK for me too, but I think they should have the same deprel regardless of what kind of list item marker they are.
LS issue --> #465 AFX issue --> #152
So I think we're done here.
The guidelines at https://universaldependencies.org/u/pos/X.html say it should be used very restrictively.
Setting aside the usage with
goeswith
dependents, we have:FW
orLS
: https://universal.grew.fr/?custom=65184da9dbe01. There are also a fewFW
lexemes that are not X, mainly borrowed Latin abbreviations: https://universal.grew.fr/?custom=65184e2eded21LS
, https://universaldependencies.org/u/dep/list.html says that list item numbers should be NUMFW
,ADD
- URLs and email addresses (would PROPN work for these?),GW
(mainly space-separated parts of filenames),NN
andNNP
within filenames, andAFX
affixes like "ex". SomeGW
parts of filenames have substantive UPOS, as do someFW
andAFX
words: https://universal.grew.fr/?custom=651850b455da1GUM XPOS doesn't use
ADD
orAFX
(these are more recent additions to the PTB tagset). But I see internet addresses under PROPN in GUM, which makes sense linguistically.I think steps here are:
LS
list markersADD
to PROPN instead of X, and move guidelines examples from SYM (UniversalDependencies/docs#973)flat
orgoeswith
, and what to do about transparent syntax within parts of filenames) (UniversalDependencies/docs#666)X
and there should be anExtPos
)