globalwordnet / english-wordnet

The Open English WordNet
https://en-word.net/
Other
478 stars 58 forks source link

Lemmas with '%' #1125

Closed jmccrae closed 1 month ago

jmccrae commented 1 month ago

Lemmas with '%' are potentially ambiguous as discussed in #1123 as this leads to two percentage (%) occurring in the sense key.

This PR fixes our tools to work with them as follows.

The lemma and the lex_sense are split by the last percentage sign to occur. In this way ambiguity is avoided.

This even works with the Princeton WordNet tools:

-> % wordnet "100% correct" -over

Overview of adj 100%_correct

The adj 100% correct has 1 sense (no senses from tagged texts)

1. accurate, 100% correct -- (conforming exactly or almost exactly to fact or to a standard or performing with total accuracy; "an accurate reproduction"; "the accounting was accurate"; "accurate measurements"; "an accurate scale")
1313ou commented 1 month ago

This ignores the possible presence of a head in a sense key. If lemmas can have unescaped %, so do heads, leading to possible unparsability of sense key. This does not happen with the current data set and is unlikely. But unlikely things happen sometimes.