Replace text with unicode symbols

COMCIFS / cif_core

The IUCr CIF core dictionary

15 stars 9 forks source link

Replace text with unicode symbols #413

Closed rowlesmr closed 1 year ago

rowlesmr commented 1 year ago

As aluded to in https://github.com/COMCIFS/cif_core/pull/410#issuecomment-1575022901

This PR replace all unambiguous markup with unicode.

The eventual goal is to have unicode symbols everywhere, rather than textual descriptions.

I haven't touched the update dates yet.

rowlesmr commented 1 year ago

See #414 re checker failure

jamesrhester commented 1 year ago

Unicode in the definition text is a little tricky, because this definition text is used to automatically prepare text for Volume G and the online web pages. Before doing this, we should coordinate with both Vol G people (me and @nautolycus ) and the Chester people (@publcif ) to confirm that they can handle the likely unicode characters that will pop up. So I'd hold off on doing anything until those people respond here. Last I checked the Vol G workflow would suffer.

rowlesmr commented 1 year ago

Roger dodger.

nautolycus commented 1 year ago

Last I checked the Vol G workflow would suffer.

This shouldn't be an issue for Vol. G, at least so long as Unicode is used conservatively (the Little Dictionary has sentences with Russian, Japanese and Eastern European text, which is a little tricky to handle!). I'm polling internal views to see if people here can identify any other possible gotchas, so don't commit just yet.

nautolycus commented 1 year ago

I've consulted with the IT and editorial people in Chester and there is no objection to using Unicode in the dictionary definitions.

jamesrhester commented 1 year ago

I don't think there is a need to update the dictionary date, as all changes are cosmetic, i.e. do not change the behaviour of software that relies on these definitions.

vaitkus commented 1 year ago

@jamesrhester well, a new enumeration value of 'α' was added thus be behaviour did change so the date should probably be modified.

Also, as I commented before, are we completely ok with values of _description_example.case being modified to contain Unicode since this attribute provides examples on how values should be written in actual CIF instance files. While Unicode can be used in CIF2 files (and its use is probably encouraged), the given example could not be used verbatim in CIF1.1 files.

I do not think that it is a very big deal, but it might be something we want to take into account.

jamesrhester commented 1 year ago

I guess you are right about 'α', as it was only in the template dictionary I didn't pay attention.

The extra Unicode examples are OK, as DDLm is meant to apply to any format, not just CIF1/2.

vaitkus commented 1 year ago

OK, just making sure.