UniversalDependencies / docs

Universal Dependencies online documentation
http://universaldependencies.org/
Apache License 2.0
274 stars 249 forks source link

Validation Error: Train/Dev/Test Splits in UD_English-ESLSpok #929

Closed kristopherkyle closed 1 year ago

kristopherkyle commented 1 year ago

Hello all,

Our team is working on getting the initial release of UD_English-ESLSpok validated. The treebank is currently rather small (20k tokens), but we are in the process of annotating more data (we have another 50k tokens with manual XPOS tags that will be supplemented with UD annotations in the near future). So far, we have used the treebank in concert with other English UD treebanks (e.g., UD_English-ESL, UD_English-EWT, UD_English-GUM). For our purposes, it has been helpful to have a predetermined section of the data devoted to train/dev/test, which are added to these sections in other corpora when training and testing models. While we can certainly resample the data for the purpose of passing the validation checks, it would be nice to keep the distributions consistent across UD and our project homepage (which includes other data).

Is an exemption reasonable in this case?

Best,

Kris

dan-zeman commented 1 year ago

Yes. Exemption added, the treebank is now valid.

dseddah commented 1 year ago

Hi Dan, we didn't know that it was possible to be granted an exception for the split. as our narabizi treebank has already been used for evaluation with a canonical split, would it be possible to benefit from the same exemption?

Thanks, Djamé