COMCIFS / cif_core

The IUCr CIF core dictionary
15 stars 9 forks source link

Should the `_audit_update_record` be redefined as a separated category? #456

Open vaitkus opened 1 year ago

vaitkus commented 1 year ago

The definition of the AUDIT states that:

    The CATEGORY of data items used to record details about the
    creation and subsequent updating of the data block.

Traditionally this was done by first describing the creation of the file using the _audit_update_record and _audit_creation_method data items and appending the description of any further changes to the _audit_update_record data item, see, for example an excerpt from COD entry 1552645:

_audit_creation_date             18-07-06
_audit_creation_method           CRYSTALS_ver_14.40
_audit_update_record
;
2018-07-06 - Report on C26 H33 N O5 
             by Anthony C. Willis
       for   Steve Pyne and Anthony Carroll

2018-07-06 - details of refinement in _refine_special_details below

2018-07-06 - passes checkcif tests with no warnings 

;

Intuitively, it seems that the _audit_update_record could be split into a separate AUDIT_UPDATE category (date, revision, author_id, etc.). However, there are several additional considerations related to this redefinition:

  1. Should the creation of CIF still be described using separate data items (_audit_creation_date, _audit_creation_method) or should it be treated as just another entry in the AUDIT_UPDATE loop?
  2. Should it be allowed to associate more than one author with a specific revision? Having a many-to-many relationship would be quite cumbersome, especially if we would like to keep things normalised.

For the purposes of the COD we have defined the COD_CHANGELOG category, which is used in a way that is quite similar to the one proposed above:

  loop_
  _cod_changelog_entry.id
  _cod_changelog_entry.author
  _cod_changelog_entry.date
  _cod_changelog_entry.text
  1 'Doe, John' 2010-07-15T12:20:13+02:00
  ; The x coordinate of the 'Cu' atom was changed from '0.8' to '0.7' after
    consulting the original publication.
  ;
  2 'cif_fix_values 1646 2011-03-28 12:23:43Z adriana' 2011-12-04T12:12:05+02:00
  ; '_cell_measurement_temperature' value '300K' was changed to '300' -
    the value should be numeric and without a unit designator.
  ;

Note, that in this case each revision only has a single non-normalised author, which can be either a person or a piece of software.

Alternatively, we could keep using the multi-line _audit_update_record, but then it would probably be useful to provide at least some guidelines to the authors that plan to modify this field (e.g. provide the date, always append new text to the end, etc.).

jamesrhester commented 1 year ago

I think we should move to a system such as you have in the COD. We should not have any structured information embedded within text strings. I would be in favour of a new AUDIT_UPDATE category as described, and deprecating AUDIT_CREATION_*.