COMCIFS / Powder_Dictionary

CIF definitions for powder diffraction
4 stars 4 forks source link

Deprecate pd_proc/info_author items in favour of using audit_author #113

Closed jamesrhester closed 1 year ago

jamesrhester commented 1 year ago

cif_core has recently added an identifier to the audit_author category, allowing other categories that need to identify authors to simply provide a pointer to that identifier. Therefore, the various email/fax/name etc. items in pd_proc_info_author and pd_meas_info_author can be deprecated and replaced by a pointer to names in the audit_author category.

Or even better, the audit_author_role category can be used and these categories completely deprecated.

vaitkus commented 1 year ago

The augmentation of the AUDIT_AUTHOR_ROLE category seems like a good idea.

jamesrhester commented 1 year ago

Are there any missing roles in AUDIT_AUTHOR_ROLE that would stop it replacing the above categories?

vaitkus commented 1 year ago

The measurement role seems to almost perfectly describe the PD_MEAS_INFO_AUTHOR category ("Collected and/or reduced diffraction data." vs "This section contains information about the person(s) who conducted the measurement.").

The analysis role seems to most closely describe the PD_PROC_INFO_AUTHOR category, though, I am not sure if the role is not too narrow ("Worked on the structural model." vs "This section contains information about the person(s) who processed the data."). I am not entirely sure what fits under the "data processing" task in the context of powder diffraction experiments.

Also, note that the AUDIT_AUTHOR category does not currently contain the email, fax and phone data items that are provided by the PD_*_AUTHOR categories. These data items are provided by the AUDIT_CONTACT_AUTHOR category that directly links to the AUDIT_AUTHOR category, however, it might be more convenient to introduce these items directly to AUDIT_AUTHOR.

jamesrhester commented 1 year ago

Agree that we should add email and phone to AUDIT_AUTHOR if we are making it the primary source of information.

Table 3.3.3.1 suggests that PD_PROC is a part of analysis, but the PD_PROC category includes things like 2 theta angular adjustment and various corrections to observations that properly belong in data reduction.

Agree that PD_MEAS_INFO_AUTHOR category covers measurement.

I think we simply deprecate the usage of the powder dictionary PD_PROC_INFO_AUTHOR category and point the interested user towards the audit_author_role category. We could then improve the definition of analysis in cif_core to include "obtained fitted parameters, such as a structural model, from the data" to cover situations where, e.g. QPA was done using peak heights and the parameters that were fitted did not include structural parameters.

rowlesmr commented 1 year ago

I agree with this. There should be one way of having author information, and core should be expanded to cover the details of each use-case.

.

On a practical basis, how do you deprecate a category? do you just need to depracate all the individual data items, or does the _definition_replaced.id/by also apply to categories?

rowlesmr commented 1 year ago

Also, note that the AUDIT_AUTHOR category does not currently contain the email, fax and phone data items that are provided by the PD_*_AUTHOR categories. These data items are provided by the AUDIT_CONTACT_AUTHOR category that directly links to the AUDIT_AUTHOR category, however, it might be more convenient to introduce these items directly to AUDIT_AUTHOR.

AUDIT_AUTHOR is for people associated with the experiment. AUDIT_CONTACT_AUTHOR is for the corresponding author(s), so there are different intents there.

If we're going to add all the contact things from AUDIT_CONTACT_AUTHOR in AUDIT_AUTHOR, would it also make sense to have a boolean data item _audit_author.is_contact, and deprecate AUDIT_CONTACT_AUTHOR? (or you could just have _audit_contact_author.id all by itself, and putting that in will do all the heavy lifting.).

. I can see that the role is a separate category to author, so multiple roles can be assigned per author. Is there a similar mechanism for multiple address/contact details?

.

Just FYI - Author-related things:

_audit_author.address
_audit_author.id
_audit_author.id_orcid
_audit_author.name

_audit_author_role.id
_audit_author_role.role
_audit_author_role.special_details

_audit_contact_author.address
_audit_contact_author.email
_audit_contact_author.fax
_audit_contact_author.id
_audit_contact_author.name
_audit_contact_author.phone
jamesrhester commented 1 year ago

If we're going to add all the contact things from AUDIT_CONTACT_AUTHOR in AUDIT_AUTHOR, would it also make sense to have a boolean data item _audit_author.is_contact, and deprecate AUDIT_CONTACT_AUTHOR? (or you could just have _audit_contact_author.id all by itself, and putting that in will do all the heavy lifting.).

This may be an idea, but because I suspect the AUDIT_CONTACT_AUTHOR category is deeply related to CIF processing at the IUCr offices, I doubt that there is a lot of enthusiasm for fiddling with it at the moment. Perhaps @publcif can comment.

I can see that the role is a separate category to author, so multiple roles can be assigned per author. Is there a similar mechanism for multiple address/contact details?

No. Is that something that would be useful?

To deprecate a category you'd have to deprecate every data name, and the category itself.

publcif commented 1 year ago

Although IUCr Journals do not require AUDIT_CONTACT_AUTHOR, Acta Cryst E and IUCrData do require PUBL_CONTACT_AUTHOR in CIF submissions. The PUBL and AUDIT categories largely follow the same pattern w.r.t. AUTHORs, but as I understand it, differ in scope: AUDIT per structure block, PUBL per CIF file. I believe the AUDIT category is favoured by databases (e.g. CSD), and CIF 'writers' (e.g. OLEX). So I certainly would not encourage deprecation of this category.

However, I've long seen the need for this sort of thing to facilitate some common processing tasks - e.g. making sure _publ_contact_author is actually also a _publ_author, establishing if an author is a correspondence author for publication purposes or just a contact author for submission purposes, identifying an author's institution(s), etc. But that would probably involve creating new categories - not deprecating existing categories (even if they are not ideal).

jamesrhester commented 1 year ago

Ok, so let's shelve the idea of deprecating AUDIT_CONTACT_AUTHOR for now. Improving AUDIT_AUTHOR is a task for cif core, so I'll raise an issue there relating to that.

Meanwhile, I think we've agreed that we can deprecate the pdCIF-specific author categories, simply using the core CIF categories instead?

rowlesmr commented 1 year ago

I think so. If we're duplicating information, then we should go with core.

rowlesmr commented 1 year ago

Would we need to add PD-specific _audit_author.phase_id and /or _audit_author.diffractogram_id? (as we did in REFLN with _pd_refln.phase_id)

jamesrhester commented 1 year ago

I don't think we envision an author being responsible for a single phase, so no need for phase_id. On the other hand, we might imagine an author being responsible for a particular measurement (e.g. went to the synchrotron) or particular sample/sample prep. Probably best to have this discussion in a different issue, though.