COMCIFS / cif_core

The IUCr CIF core dictionary
14 stars 9 forks source link

Clarify use of _type.purpose Key and Link #367

Open jamesrhester opened 1 year ago

jamesrhester commented 1 year ago

Some data names may plausibly have purpose of both Link and Key, if they are a foreign key that is also a key data name of the category. Note that both these purposes are redundant information as this information can be determined from other attributes. Therefore we just need to decide on a rule for assigning _type.purpose, and it won't have any semantic consequences.

I suggest that Link overrules Key, so that a key data name that is a link to another data name has _type.purpose of Link.

Also, as currently written a Key data name has to have a unique value, suggesting that it is the single key data name for the category. This purpose would obviously change if new key data names are added, so I suggest that we adjust the ddl description of Key purpose to state that this data name is one of the category key data names.

vaitkus commented 1 year ago

Some data names may plausibly have purpose of both Link and Key, if they are a foreign key that is also a key data name of the category. Note that both these purposes are redundant information as this information can be determined from other attributes. Therefore we just need to decide on a rule for assigning _type.purpose, and it won't have any semantic consequences.

I fully agree that there is a need for some clarification, but are the purposes really redundant? I guess the status of an item being a Key can be determined from the category definition, but other than that I am unsure what attributes could signal that. Similarly, the presence of _name.linked_item_id could indicate the Link purpose, but it can also be used by items with the SU purpose. Of course, I might be missing something obvious here.

I suggest that Link overrules Key, so that a key data name that is a link to another data name has _type.purpose of Link.

Seems sound. Do you suggest that this change is made automatically by the software or should we change the existing dictionaries to conform to this rule (I guess that majority of them already do)?

Also, as currently written a Key data name has to have a unique value, suggesting that it is the single key data name for the category. This purpose would obviously change if new key data names are added, so I suggest that we adjust the ddl description of Key purpose to state that this data name is one of the category key data names.

I guess that this extension is needed for the merged datasets? I do not object to the change, but currently there seem to be only two scenarios where composite keys are (properly) used:

  1. The individual items serve as foreign keys (links) to other categories. In this case the items will have the Link purpose and will not be required to be unique in the given category (only in the linked category).
  2. The individual items serve as keys of a top level category. In this case, the only reason to use more than one key is if the keys are natural keys, i.e. they encode a meaningful value (e.g. Miller indices). Due to this, the items would be assigned the Encode purpose.
jamesrhester commented 1 year ago

@vaitkus's logic is impeccable. Let us leave the Key definition in ddl.dic alone. Essentially a Key type.purpose means that a data name forms part of a key, is not linked to a parent data name, and has no information encoded in it.

Seems sound. Do you suggest that this change is made automatically by the software or should we change the existing dictionaries to conform to this rule (I guess that majority of them already do)?

We should change the dictionaries, if necessary, to conform to this rule. I believe that they should already conform, and if there are places that they don't we should check that we haven't overlooked something.