UniversalDependencies / UD_Portuguese-Bosque

This Universal Dependencies (UD) Portuguese treebank.
Other
49 stars 11 forks source link

symbol again? #64

Closed vcvpaiva closed 7 years ago

vcvpaiva commented 7 years ago

I am sorry, but I'm not clear on how we're supposed to deal with currencies. In Dan's version of the bosque he has: tag name="SYM"435 occurrences e.g %, US$, R$, CR$, /, U$ the new version only has %.

related to SYM and relations #59

livyreal commented 7 years ago

in the EN "gold" bank, we have

12 U$ U$ PROPN NNP Number=Sing 13 compound _

in our version 9 U$ U$ NOUN <np-idf>|N|M|P|@P< Gender=Masc|Number=Plur 0 root _ _

I don't care if they would be tagged as NOUN, PROPNOUN or SYM, maybe SYM is good because sometimes R$ is 'real' or 'reais'. What you think @claudiafreitas ?

fcbr commented 7 years ago

According to the documentation for SYM: Many symbols are or contain special non-alphanumeric characters, similarly to punctuation. What makes them different from punctuation is that they can be substituted by normal words. This involves all currency symbols, e.g. $ 75 is identical to seventy-five dollars.

lemmas US$, R$, CR$ should be tagged as SYM.