COMCIFS / cif_core

The IUCr CIF core dictionary
14 stars 9 forks source link

Evaluation method of `_atom_site.type_symbol` #452

Open vaitkus opened 11 months ago

vaitkus commented 11 months ago

The dREL method for _atom_site.type_symbol may need to be redesignated from Evaluation to Definition and instead provide a default value instead of a definitive one. The reason for the proposed change is that the 0-th component of _atom_site.label may match the type symbol, but it is not required to do so.

However, if this change is implemented then all other dREL Evaluation methods that rely on this item would also need to be redesignated as definition methods (for example, see PR #451 which uses this item in the dREL evaluation code for the ATOM_TYPE category).

Any ideas on how this could be resolved?

jamesrhester commented 11 months ago

I agree that this particular dREL code should be Definition due to the use of may.

However, I don't think the changes to dREL are particularly troublesome or widespread. Essentially all accesses to the type symbol for an atom should go through _atom_site.type_symbol, rather than the AtomSite function applied to the label, and then these remain Evaluation methods, which will return unknown, as they should, if the atom type symbol is missing or doesn't correspond to an entry in ATOM_TYPE.

Perhaps I'm missing why other dREL codes should also become Definition?

vaitkus commented 11 months ago

I was unsure if dREL methods automatically resolve values to their defaults or not. If the default values are not automatically used, then there is no problem. However, if they are automatically resolved (including the logic in dREL Definition methods), then we are not guaranteed to always get the correct atom type as sometimes the value will be generated using the same AtomSite function (although indirectly). Does that make sense? If not, I can provide an example of what I am talking about.

However, rereading the description of _atom_site.label I started to wonder how it should actually be interpreted:

Component 0 usually matches one of the specified _atom_type.symbol
codes. This is not mandatory if an _atom_site.type_symbol item is included
in the atom site list. The _atom_site.type_symbol always takes precedence
over an _atom_site.label in the identification of the atom type. 

The "Component 0 usually matches" part throws me off a bit. Does the usually part always apply, i.e. "Component 0" is never strictly required to match an atom type, or is that usually later on further explained by the next sentence. That is, can that entire part of the definition be interpreted as:

Component 0 MUST match _atom_type.symbol unless an explicit
_atom_site.type_symbol data item is provided.

If this is the intended interpretation, then we do not even need to change the method type of _atom_site.type_symbol.

jamesrhester commented 11 months ago

So, in general, the default value should always be used by dREL if a value is missing.

The correct reading of the definition here should be the second one, that is, it will always match unless an explicit _atom_site.type_symbol is provided, so perhaps there is no need to change anything. Probably best to delete the first two sentences of the definition.