globalwordnet / english-wordnet

The Open English WordNet
https://en-word.net/
Other
478 stars 58 forks source link

2024 Release Candidate #1112

Closed jmccrae closed 3 weeks ago

jmccrae commented 1 month ago

When this branch is merged, it will be tagged as the 2024 release

jmccrae commented 1 month ago

@1313ou @fcbond @goodmami

2024 Release Candidate is ready. Please let me know of any issues by the end of the week.

goodmami commented 1 month ago

@jmccrae I generated the xml file with scripts/from_yaml.py. First, I noticed that the generated file uses the WN-LMF-relaxed-1.3.dtd schema instead of WN-LMF-1.3.dtd because of the second True argument in this line:

https://github.com/globalwordnet/english-wordnet/blob/a513594c2a1f28de6d043dc7052616b8df4657ae/scripts/from_yaml.py#L372

thus, part=True here:

https://github.com/globalwordnet/english-wordnet/blob/a513594c2a1f28de6d043dc7052616b8df4657ae/scripts/wordnet.py#L105-L112

Is there a reason to use the -relaxed schema, or should it be the normal one?

And, tangentially, it looks like the -relaxed schema may not be updated as it does not have the xml:space attribute?

Once I stopped Wn from rejecting the relaxed schema in the doctype declaration, I didn't have any issues adding it to the database and running some simple queries. I haven't yet done more extensive testing beyond that. So, for now, the main issue is with the doctype schema.

jmccrae commented 4 weeks ago

The 'relaxed' scheme was used to allow multiple small XML files, which would be merged later. This was how we worked with the wordnet before we adopted YAML in #664, it should not be generated anymore.