Closed dan-zeman closed 5 years ago
P.S. If you believe that a validation rule is too strict (i.e., requires something that does not follow from the guidelines), please raise an issue in the issue tracker of the docs repository.
Thanks for picking up this thread, Dan. I think there is something wrong with the link to the full proposal. I just get an empty page.
Thank you, Dan. I appended my opinion at advmod but not UPOS=ADV. I think the validation rule for advmod
too strict.
Thanks for picking up this thread, Dan. I think there is something wrong with the link to the full proposal. I just get an empty page.
Oops. Thanks for the heads-up. It looks like the name I picked for the page was already taken by the old (and obsolete) validation machinery, which generates an empty page each time a corpus is modified, and it was also the case of https://github.com/UniversalDependencies/docs/commit/76ad6d5c06176f8532577b865e280fbde47d9432 :-) (@fginter)
I have now renamed the page to validation-rules.
I appended my opinion at advmod but not UPOS=ADV.
Thanks, Koichi. See my answer there.
For the nagation of aux
in old issue for UD 2.4:
One of the possible exceptions is negation. So you can actually attach the first 不 directly to the auxiliary, and the validator should accept it if 不 has the feature
Polarity=Neg
.
but now the validator for UD 2.5 does not accept the negation of aux
. We've already added Polarity=Neg
for all 不, then how do we do with the new validator?
See my answer there. It should help if the negative particle is tagged PART
.
Following the discussion at the end of UDW 2019 in Paris, I tried to put together a proposal of the validation vs. release policy for the upcoming releases. The goal is to be able to add new tests and find more guideline violations, but without having to kick out older treebanks that do not pass the stricter tests (some of them are no longer maintained and there is no one who could fix the bugs soon; others have too many bugs and fixing them will take a lot of time).
The full proposal is currently available here and comments are welcome. In a nutshell: if a treebank was valid and released in UD 2.3, it can stay in the upcoming releases without passing tests that were added after UD 2.3. Newer treebanks must pass all tests that exist when the treebank is released for the first time.
I have modified the online validation page to reflect the proposal and identify treebanks with legacy status. There are 6 old treebanks that contain errors which were not tolerated even in UD 2.3 (that means, these errors were introduced in UD 2.4 and slipped attention of the release team). Errors of this type must be fixed before UD 2.5. The treebanks are Croatian-SET (@nljubesi), English-EWT (@manning @sebschu), French-Spoken (@sylvainkahane), Norwegian-Bokmaal, Norwegian-NynorskLIA (@liljao), Serbian-SET (@tsamardzic).
4 treebanks were released in UD 2.4 for the first time but contained errors that were already checked at that time. Hence I think they are not really legacy treebanks (the only reason why they made it into the release was that we ignored some error messages in order to save older treebanks). (Disclaimer: I’m actually looking at the current report, so it is possible that the errors were not there at release time and were introduced later.) The treebanks are Classical_Chinese-Kyoto (@KoichiYasuoka), German-HDT (@akoehn @EmanuelUHH), German-LIT (@a-salomoni), Old_Russian-RNC (@olesar).
Finally, issues are also reported for 4 new treebanks: Bhojpuri-BHTB (@shashwatup9k), Chinese-GSDSimp (@qipeng), Skolt_Sami-Giellagas (@rueter), Swiss_German-UZH (@noe-eva).
What do people think about this?