COMCIFS / cif_core

The IUCr CIF core dictionary
14 stars 9 forks source link

Harmonisation of dictionary naming convention #488

Open nautolycus opened 3 months ago

nautolycus commented 3 months ago

@vaitkus and I were musing over the wide variation in names and references to the various dictionaries, and note the following scatter of designators (G means version on GitHub (sometimes corresponds to DDL1/DDLm versions), P refers to the current latest versions on the wwPDB site (over which COMCIFS has no say). We feel it would be beneficial to harmonise these, in ways suggested below the table. I raise this as an issue in the core area, but it relates to all active dictionary projects. If we approve a harmonisation project, I suggest making changes initially to the electron density dictionary, which I'm currently revising and which has no active input from a wider community.

dictionary ITG1e ITG2e Website Logo File DDLm _dictionary.title
Core coreCIF coreCIF coreCIF CIF_core cif_core.dic CORE_DIC
Powder pdCIF pdCIF pdCIF CIF_pd cif_pd.dic
cif_pow.dic (G) CIF_POW
Modulated struct msCIF msCIF msCIF CIF_ms cif_ms.dic CIF_MS
Electron density rhoCIF rhoCIF rhoCIF CIF_rho cif_rho.dic Cif_rho
Macromolecular mmCIF mmCIF mmCIF CIF_mm cif_mm.dic N/A
mmcif_std.dic (P)
mmcif_pdbx.dic (P)
Image imgCIF imgCIF imgCIF CIF_img cif_img.dic Cif_img.dic
mmcif_img.dic (P)
Symmetry symCIF symCIF symCIF CIF_sym cif_sym.dic N/A
Restraints N/A . . CIF_restraints cif_core_restraints.dic
cif_restr.dic (G) CIF_RSTR
Twinning N/A twinCIF . CIF_twinning cif_twinning.dic
cif_twin.dic (G) CIF_TWIN
Magnetic N/A magCIF magCIF CIF_mag cif_mag.dic MAGNETIC_CIF
Topology N/A . topoCIF CIF_topo cif_topology.dic TOPOLOGY_CIF
Topology.dic (G)
Multiblock core N/A . "multiblock coreCIF" . cif_multiblock.dic MULTIBLOCK_DIC
multi_block_core.dic (G)
DDL1 . . . CIF_DDL ddl_core.dic N/A
DDL2 . . . CIF_DDL mmcif_ddl.dic N/A
mmcif_ddl.dic (P)
DDLm N/A . . CIF_DDL ddl.dic DDL_DIC
DDLm.dic (G)

Specific suggestions:

Other comments

jamesrhester commented 2 months ago

Great work in compiling this table. I agree with all the suggestions.

I do not think DDLs need their own logos. That just draws attention to differences which are for most purposes irrelevant.

I think ddl filenames of the form ddl1.dic and ddlm.dic would work. We can't rename something the wwPDB have custody of - but we could make ddl2.dic resolve to mmcif_ddl.dic in those situations we have control over.

Yes, I agree that we should unify the repository names.

vaitkus commented 2 months ago

One more question -- should _dictionary.title attribute values be written in all uppercase letters as is being done now? This is mostly a stylistic consideration since _dictionary.title values are case-insensitive.

For example, the current name of the cif_core.dic dictionary is CORE_CIF, so should the name value be changed to cif_core or to CIF_CORE? I think that having it in uppercase makes sense for two reasons:

  1. The _dictionary.title attribute value is also used as the category name of the HEAD category (see description of _name.category_id). Having it in uppercase would be consistent with all the other category names.
  2. The _dictionary.title attribute value is also used as the name of the main dictionary data block (i.e. data_CIF_CORE). Having it in uppercase would be more consistent with the rest of the dictionary (category save frame names are written in all uppercase, data name save frames are written in all-lowercase).
jamesrhester commented 2 months ago

All uppercase works for me.

nautolycus commented 2 months ago

No objection.

vaitkus commented 2 months ago

@jamesrhester, I guess the "Change approved" label is appropriate or should this discussion be circulated elsewhere as well?

jamesrhester commented 2 months ago

Raising an issue/pull request in the respective repositories would be sufficient, referencing this issue.

vaitkus commented 2 months ago

It was agreed in an email exchange between @jamesrhester, @nautolycus and @vaitkus that repositories should be named same as the dictionary title except in all lower case, e.g. the CIF_CORE dictionary should reside in a repository called cif_core.

There is actually one more name that would be nice to standardise -- the name of the Head save frame. This save frame is referenced by other dictionaries when importing the entire dictionary, therefore I propose to keep the name of this save frame same as the data name. That is, the cif_core.dic dictionary file would have a single CIF_CORE data block which in turn would contain the CIF_CORE save frame (see PR #490 for an example of this layout). The import statement would then look something like:

[{'dupl':Ignore  'file':cif_core.dic  'mode':Full  'save':CIF_CORE}]

Alternatively, we could call the HEAD save frame something neutral like "head", since the dictionary name would already be reflected in import statements by the dictionary filename. The import statement would then look something like:

[{'dupl':Ignore  'file':cif_core.dic  'mode':Full  'save':HEAD}]

Which one looks better?

jamesrhester commented 1 month ago

I think the second one ('HEAD') provides more information to the casual reader, immediately showing that the whole of the imported dictionary is being enhanced.

vaitkus commented 1 month ago

Having the same name for the data block (dictionary) and the head save frame, while syntactically valid, seems to have already caused some issue with the external software (see https://github.com/emmo-repo/CIF-ontology/issues/194), so it would be best to avoid it.

I think the second one ('HEAD') provides more information to the casual reader, immediately showing that the whole of the imported dictionary is being enhanced.

I fully agree with these points, but after thinking about it some more, I think that we should also avoid having the same HEAD category name in all dictionaries since it might obfuscate certain errors (e.g. those dealing with import statements). Instead, I suggest that the HEAD category names be constructed by appending the _HEAD to the dictionary name (e.g. CIF_CORE_HEAD, CIF_PD_HEAD). What do you think?

jamesrhester commented 1 month ago

I think <dictionary_name>_HEAD is an excellent idea. I had also started wondering if it was really a good idea to have the same category name (HEAD) in many dictionaries.

nautolycus commented 1 month ago

I'm also happy to go with <dictionary_name>_HEAD. In favour of the simple HEAD proposal, that has, to my mind, the effect of establishing a formal starting point, just as '/' indicates the root of a filesystem. Importing to a new HEAD has, in this metaphor, a similar effect as a chroot call. On the other hand, I have in the past tripped up in a chroot'd environment precisely because inspection of the root designator doesn't give you a clue that you might be in an unexpected place.