UniversalDependencies / UD_German-GSD

Other
18 stars 5 forks source link

Dashes with blank XPOS tags #34

Closed AngledLuffa closed 6 months ago

AngledLuffa commented 6 months ago

Usually, - is tagged as $(

In the following sentences, there is at least one dash with a blank _ tag

# sent_id = train-s1725
# sent_id = train-s1876
# sent_id = train-s1897
# sent_id = train-s1969
# sent_id = train-s2045
# sent_id = train-s2127

# sent_id = dev-s595
# sent_id = dev-s754
# sent_id = dev-s757
# sent_id = dev-s786

# sent_id = test-s448

This is somewhat problematic for Stanza, at least, as it occasionally learns to predict None for the xpos

dan-zeman commented 6 months ago

Does the problem persist in the dev branch even after https://github.com/UniversalDependencies/UD_German-GSD/commit/ab6265023b83ff6674d146fd8c5190eef59f8bb5 (a fix I committed 3 weeks ago)?

AngledLuffa commented 6 months ago

Yep, that was all of them! Thanks