Closed vaitkus closed 2 years ago
I think this can be fixed by simply calling _pd_refln.wavelength_id
an alias of _refln.wavelength_id
in cif_core.
Would the proposed solution:
a) add the alias to CIF_CORE
and remove the save__pd_refln.wavelength_id
save frame from the CIF_POW
altogether.
b) add the alias to CIF_CORE
, use it to detect duplicate definitions and silently ignore the import?
Approach a) is simpler, but it also results in data name from the CIF_POW
dictionary being fragmented over several dictionaries. Approach b) is more flexible, however, I would probably inverse the relationship by adding _refln.wavelength_id
as an alias of _pd_refln.wavelength_id
in CIF_POW
and this way retain all of the _pd
items strictly in CIF_POW
.
However, I think I made a slight mistake in my last message by claiming that the duplicates are not detected due to the differing _definition.id
attribute values since duplicate save frames detection is based on their save frame codes and not on the _definition.id
attribute. So the simplest solution might be to simply rename save__pd_refln.wavelength_id
to save_refln.wavelength_id
in the CIF_POW
dictionary and potentially add _refln.wavelength_id
data name as an alias. What do you think?
I see the issue with my suggestion - a user will naturally check the powder dictionary for this data name, even if automatic tools would see it in CIF core and be happy.
I'm not comfortable with using identical save frame names as it feels like a technical 'trick' and may be fragile as it would be a single exception that future tools may inadvertently overwrite. So I think it is best to simply add _pd_refln.wavelength_id
as an alias in CIF_CORE, and _refln.wavelength_id
as an alias in cif_pow, leaving the current save frame names. Essentially we are duplicating the definition in the same way as creating an alias 'duplicates' the definition, and in this case _name.category_id
and _name.object_id
should be the same. We can note in the cif_pow definition that this is purely a duplicate of the core definition and recommend using the core equivalent.
This of course creates the possibility of a data block containing both data names, but that is no different to data blocks containing aliases of data names that are already present.
I'm not comfortable with using identical save frame names as it feels like a technical 'trick' and may be fragile as it would be a single exception that future tools may inadvertently overwrite.
Is the previously proposed approach really that out of the ordinary? My proposal was based on the following observations:
CIF_POW
already have frame names that are also used in CIF_CORE
(_refln.F_squared_meas
, _refln.F_complex
) and are thus redefined due to the 'Ignore' import mode. However, it is important to note that in those cases the data names did not change nor get new aliases.So I think it is best to simply add _pd_refln.wavelength_id as an alias in CIF_CORE, and _refln.wavelength_id as an alias in cif_pow, leaving the current save frame names. Essentially we are duplicating the definition in the same way as creating an alias 'duplicates' the definition, and in this case _name.category_id and _name.object_id should be the same. We can note in the cif_pow definition that this is purely a duplicate of the core definition and recommend using the core equivalent.
I see two drawbacks to this approach:
_pd_refln.wavelength_id
as an alias in CIF_CORE
would imply that it is OK to use this name instead of _refln.wavelength_id
even if the rest of the file does not describe a powder experiment. This is not a big problem, but it does look quite strange and slightly deviates from the usage in DDL1. CIF_POW
dictionary would ends up with two save frames that define items with identical data names. Since, to my understanding, duplicate imports are detected based strictly on save frame names, both save__pd_refln.wavelength_id
and save_refln.wavelength_id
save frames would simultaneously define _pd_refln.wavelength_id
and _refln.wavelength_id
data names. The proposed human-readable note about one of the save frames being an intentional duplicate would be useful to the users, but would be of little help in automatically resolving this in dictionary handling software. It would be much more robust to handle this without having to rely on any ad-hoc built in rules.This of course creates the possibility of a data block containing both data names, but that is no different to data blocks containing aliases of data names that are already present.
I would view the situation where an alias in one save frame matches a data name (or an alias) in another save frame as ambiguous and thus a mistake. Also, depending on the software implementation this may lead to both data name definitions being simultaneously defined or one of the definitions being randomly picked up over the other which may even differ between software runs (e.g. implementation relying on hash tables).
Adding _pd_refln.wavelength_id as an alias in CIF_CORE would imply that it is OK to use this name instead of _refln.wavelength_id even if the rest of the file does not describe a powder experiment. This is not a big problem, but it does look quite strange and slightly deviates from the usage in DDL1.
This is true. Non-powder software is unlikely to recognise this alias (unless it reads in the dictionary).
So the simplest solution might be to simply rename save__pd_refln.wavelength_id to save_refln.wavelength_id in the CIF_POW dictionary and potentially add _refln.wavelength_id data name as an alias. What do you think?
I think this is the best solution, due to the points made above. I will ask the pd list about deprecating the _pd_refln.wavelength_id
data name, which may bring up the reasons for its existence. Or not.
PR #6 fixes this.
The issue was resolved by merging PR #6.
The combination of
_name.category_id
and_name.object_id
attributes is used to uniquely identify data items in dREL code snippets. However, the combination ofrefln
andwavelength_id
is not unique as it appears both in thesave__pd_refln.wavelength_id
save frame from theCIF_POW
dictionary and in thesave_refln.wavelength_id
save frame from theCIF_CORE
dictionary (which is imported byCIF_POW
).The dREL name does not seem to be used in any of the dREL codes snippets that are present in
CIF_CORE
andCIF_POW
dictionaries so it should be safe to rename any of the previously mentioned items.Also, note that
_pd_refln.wavelength_id
item belongs to theREFLN
category which appears both inCIF_CORE
and inCIF_POW
and is the only category inCIF_POW
without thePD
prefix. TheCIF_CORE
dictionary is imported with the 'Ignore' mode which forces all duplicate data items to be silently ignored, however, in this specific casesave__pd_refln.wavelength_id
andsave_refln.wavelength_id
are not considered duplicates due to different_definition.id
attribute values (_pd_refln.wavelength_id
vs._refln.wavelength_id
).