Closed TomazErjavec closed 1 year ago
For Danish, an old 2.5 model is used. In Universal Dependencies 2.5 – Danish – DDT was this relation used: http://hdl.handle.net/11346/PMLTQ-0RGX (612 occurences) But the taxonomy corresponds to the current version of UD (current documentation, to be precise).
note for @matyaskopp: this generates query with all ids with
obl:loc
that can be run in different version of ud to see changes http://hdl.handle.net/11346/PMLTQ-COKG
In 2.12 they have been replaced with: http://hdl.handle.net/11346/PMLTQ-8FW4 relation | occurences |
---|---|
obl:lmod | 48 |
obl | 1 |
case | 3 |
advmod:lmod | 560 |
So, the question is whether we want to support old undocumented language-specific relations. It is in some old statistics: https://github.com/UniversalDependencies/docs/blob/97694404898cc696842234a1ebabb888c448f09b/_includes/stats/da/dep/obl-loc.md But in fact, it has never been documented, The Danish language has not ever documented specific relations in the whole history:
git clone git@github.com:UniversalDependencies/docs.git Scripts/UD-docs
git -C Scripts/UD-docs checkout pages-source
git -C Scripts/UD-docs log --all --full-history -- "_da/dep/*"
Returns an empty result.
Wow, a very detailed analysis!
So, the question is whether we want to support old undocumented language-specific relations.
I would say not.
Do I understand correctly that the most sensible substitution would be to obl:mod
?
Currently I made it to obl
.
The most sensible substitution is advmod:lmod
if ADV
, obl:lmod
otherwise. (http://hdl.handle.net/11346/PMLTQ-ORIW)
relation | pos | occurences |
---|---|---|
advmod:lmod | ADV | 560 |
case | ADP | 3 |
obl | NOUN | 1 |
obl:lmod | NOUN | 27 |
obl:lmod | ADP | 15 |
obl:lmod | VERB | 4 |
obl:lmod | ADJ | 1 |
obl:lmod | X | 1 |
The most sensible substitution is advmod:lmod if ADV, obl:lmod otherwise
Hm, maybe most correct, not sure about sensible, because the code now does not have access to the PoS of the word: https://github.com/clarin-eric/ParlaMint/blob/d02bd049213da4a3d1e50bea07df01215883fc0b/Scripts/parlamint2release.xsl#L562-L573.
Trying to implement PoS-dependnet dependency would be difficult. I would just set it to advmod:lmod
, as this seems to mean only about 10% of errors. Which is about par on the error rate parsers make anyway...
Trying to implement PoS-dependnet dependency would be difficult. I would just set it to
advmod:lmod
, as this seems to mean only about 10% of errors. Which is about par on the error rate parsers make anyway...
ok, but it will probably produce an L2 validation error - I think advmod
should be related to ADV
ok, but it will probably produce an L2 validation error - I think advmod should be related to ADV
Ah. But given that we are patching things, might as well have some errrors...
Surprisingly, no CoNLL-I errors were produced. So, closing.
In preparation for 3.1 we are now using the common UD-SYN taxonomy. But using it with the DK corpus gives many errors like:
obl:loc
does not seem to be a legal UD syntactic relations, so what to do with this now? We probably can't leave it as it is, as it doesn't mean anything. Change it to simpleobl
? Ideas welcome!