UniversalDependencies / UD_German-GSD

Other
18 stars 5 forks source link

Correcting Tiger source texts with improved feats (#12, #14) #17

Closed adrianeboyd closed 6 years ago

adrianeboyd commented 6 years ago

Similar to #16 except that in addition missing tokens from the Tiger source sentences have been inserted as necessary to correct the source/raw texts. The inserted tokens will require revisions to the dependency annotation.

For now, the inserted tokens are attached as dep to the following word unless the token is purely punctuation, in which case it is attached as punct. All instances are marked with the comment FixTigerDep=Yes on an inserted token, although the word(s) that need to be reannotated may not be limited to the marked tokens.

I think that these sentences (listed in 93577c4 ) should be removed from active use in the UD_German corpus until they can be reannotated. The existing annotations for these sentences in #16 reflect the fact that sentences with missing words cannot be annotated sensibly in many cases (frequent use of dep, etc.).

adrianeboyd commented 6 years ago

I should have referenced #12 and #14 in the comment, adding here to link.

dan-zeman commented 6 years ago

Unfortunately, after merging #15 and #16, this pull request has conflicts and cannot be merged.

adrianeboyd commented 6 years ago

I submitted the pull requests with the idea that only one of #16 or #17 would be merged, not both. I can resubmit it so that it can be merged with the current dev if you'd like.

Because it makes such drastic changes, I wasn't sure whether the UD project would want it, especially since it's in a fairly inconsistent state with so many sentences that need to be reannotated. (I'd argue it's not really worse that what you already have, though, and at least the raw texts are better.)

dan-zeman commented 6 years ago

Ah, I see. Unfortunately I did not quite realize that before merging #16. I did not have the capacity to significantly work on German before the deadline (having to attend to dozens of other treebanks), so I quickly skimmed the diff and then decided to trust you :) and just pushed the button. I would hate to prevent your effort from being reflected in release 2.2.

This treebank does need a drastical change :) and if you can improve one aspect of it, you should not be stopped just by the fact that other aspects remain problematic. However, I am not sure what is the best way to proceed now. Since I already merged #16, we should probably just close #17, right?

adrianeboyd commented 6 years ago

No, I didn't make it clear that it was either/or. I'll close this.