Closed martinpopel closed 2 years ago
Absolutely, thanks for catching this! The unescaped hyphens are definitely a bug (the option for ignoring unused trailing features is by design, to promote compactness). Oddly other reserved characters are escaped correctly, such as parentheses:
But somehow hyphens slipped through. I can fix this, but unless @dan-zeman thinks this bug is serious enough to warrant a patch, then it will stay in the dev branch until the next UD release in May.
There are many bugs and very few are "serious enough" :-) Stay in dev, May is not so far ahead.
OK, this should be fixed in dev now.
In UDv2.9, all the GUM files use
global.Entity = entity-GRP-infstat-MIN-coref_type-identity
suggesting there will be 6 attributes in eachEntity
.First, there is a question what to do when the
identity
aka wikification is missing - it may be easier for parsing to always require 6 attributes and keep the wikification as empty string, i.e. end theEntity
with a hyphen. But the current practice (there can be just 5 attributes) is acceptable as well.However,
Entity=(abstract-182-new-6-coref-Pearson's_chi-squared_test
should be converted toEntity=(abstract-182-new-6-coref-Pearson's_chi%2Dsquared_test
, I think.