JMdictProject / JMdictIssues

JMdict Japanese dictionary - lexicographic, etc. issues management
18 stars 1 forks source link

JMdict XML data and version info #58

Closed yamagoya closed 2 years ago

yamagoya commented 2 years ago

Currently the JMdict and JMnedict XML files have an embedded creation date and version number but that information is in comments making it somewhat hard to extract.

I wonder if instead that information could be placed in the XML itself. One obvious way would be as attributes in the XML root element:

 <JMdict date="2022-01-17" version="1.10">

I wonder too if the version number could be bumped whenever any change was made to the DTD including entity changes? (Or possibly a two-level version number: major for structural changes, minor for non-structural ones?)

Finally, since substantial XML DTD changes will occur with the implementation of the XML Next Generation changes (http://www.edrdg.org/wiki/index.php/JMdict:_Next_Generation) perhaps this change could be included with that?

JMdictProject commented 2 years ago

Having the creation date and version as attributes in the root element sound fine. I'll add that to the NG document.

Not that version numbering has been a big player, but it would probably be good to have a major/minor system in place.

Apropos of dates, there is also a date "entry" tacked onto the distributed file, mainly to make sure that there is a clearly visible date stamp available to apps, even in the legacy EDICT format. https://www.edrdg.org/cgi-bin/wwwjdic/wwwjdic?1MDJ%A3%CA%A3%CD%A3%E4%A3%E9%A3%E3%A3%F4

yamagoya commented 2 years ago

Great, thanks!

JMdictProject commented 2 years ago

Having the creation date and version as attributes in the root element sound fine. I'll add that to the NG document.

Not that version numbering has been a big player, but it would probably be good to have a major/minor system in place.

Apropos of dates, there is also a date "entry" tacked onto the distributed file, mainly to make sure that there is a clearly visible date stamp available to apps, even in the legacy EDICT format. https://www.edrdg.org/cgi-bin/wwwjdic/wwwjdic?1MDJ%A3%CA%A3%CD%A3%E4%A3%E9%A3%E3%A3%F4