Closed rowlesmr closed 3 months ago
An example.
I need to make up others with separate cell measurement conditions, and magnetic and modulated structures.
Please note that (AFAIK) you don't need to explicitly give _structure.id
or _space_group.id
in this example, as all the relevant information is in a single block for each structure. I put the diffraction conditions in a separate block to show how you can link with _structure.diffrn_id
.
This links in with my idea (which I can't find the comment for) where single Set
keys are autogenerated if they don't exist. (got it: https://github.com/COMCIFS/cif_core/pull/445)
###################################
#
# Beginning of the CIF file
#
###################################
data_conditions
_diffrn.id DIFCON_1
_diffrn.ambient_temperature 900
data_blockname_one
_structure.id A #Must be unique.
_structure.diffrn_id DIFCON_1
_cell.length_a 5.4469
_cell.length_b 5.4469
_cell.length_c 5.4469
_cell.angle_alpha 90
_cell.angle_beta 90
_cell.angle_gamma 90
_cell.volume 161.61
_cell.formula_units_Z 4
_space_group.id 1 #Must be unique. Can be the same if representing the same space group in the same setting
_space_group.crystal_system cubic
_space_group.name_H-M_alt Fm-3m
loop_
_space_group_symop_id
_space_group_symop_operation_xyz
1 'x, y, z '
2 '-x, -y, z '
#...
191 'x, y+1/2, -z+1/2 '
192 '-x, -y+1/2, -z+1/2 '
loop_
_atom_site.label
_atom_site.type_symbol
_atom_site.fract_xyz
_atom_site.B_iso_or_equiv
Mn1 Mn+2 [0 0 0] 0.2
Se1 Se [0.5 0.5 0.5] 0.4
loop_
_atom_site_aniso.label
_atom_site_aniso.b_11
_atom_site_aniso.b_12
_atom_site_aniso.b_13
_atom_site_aniso.b_22
_atom_site_aniso.b_23
_atom_site_aniso.b_33
Mn1 0.2 0.05 0.08 0.2 0.03 0.2
data_blockname2
_structure.id B #Must be unique.
_structure.diffrn_id DIFCON_1
_cell.length_a 3.0205
_cell.length_b 3.0205
_cell.length_c 3.0205
_cell.angle_alpha 90
_cell.angle_beta 90
_cell.angle_gamma 90
_cell.volume 27.558
_cell.formula_units_Z 2
_space_group.id 2 #Must be unique. Can be the same if representing the same space group in the same setting
_space_group.crystal_system cubic
_space_group.name_H-M_alt Im-3m
loop_
_space_group.symop_id
_space_group.symop_operation_xyz
1 'x, y, z '
2 '-x, -y, z '
#...
95 'x+1/2, y+1/2, -z+1/2 '
96 '-x+1/2, -y+1/2, -z+1/2 '
loop_
_atom_site.label
_atom_site.type_symbol
_atom_site.fract_xyz
_atom_site.B_iso_or_equiv
V1 V [0 0 0] 0.5
###################################
#
# End of the CIF file
#
###################################
As per PR (AFAIK)
###################################
#
# A representation of a merged datablock.
# It shouldn't actually be used this way to construct
# a CIF file, but maps out how the relational tables
# would be populated.
#
###################################
#this is the merged datablock assuming _structure.space_group_id exists
data_merged_hester
loop_
_diffrn.id
_diffrn.ambient_temperature
DIFCON_1 900
loop_
_structure.id
_structure.space_group_id
_structure.diffrn_id
A 1 DIFCON_1
B 2 DIFCON_1
loop_
_cell.structure_id
_cell.length_a
_cell.length_b
_cell.length_c
_cell.angle_alpha
_cell.angle_beta
_cell.angle_gamma
_cell.volume
_cell.formula_units_Z
A 5.4469 5.4469 5.4469 90 90 90 161.61 4
B 3.0205 3.0205 3.0205 90 90 90 27.558 2
# if both structures had the same SG, then you only need to include the one SG
loop_
_space_group.id
_space_group.crystal_system
_space_group.name_H-M_alt
1 cubic Fm-3m
2 cubic Im-3m
loop_
_space_group_symop.space_group_id
_space_group_symop.id
_space_group_symop.operation_xyz
1 1 'x, y, z '
1 2 '-x, -y, z '
#...
1 191 'x, y+1/2, -z+1/2 '
1 192 '-x, -y+1/2, -z+1/2 '
2 1 'x, y, z '
2 2 '-x, -y, z '
#...
2 95 'x+1/2, y+1/2, -z+1/2 '
2 96 '-x+1/2, -y+1/2, -z+1/2 '
loop_
_atom_site.structure_id
_atom_site.label
_atom_site.type_symbol
_atom_site.fract_xyz
_atom_site.B_iso_or_equiv
A Mn1 Mn+2 [0 0 0] 0.2
A Se1 Se [0.5 0.5 0.5] 0.4
B V1 V [0 0 0] 0.5
loop_
_atom_site_aniso.structure_id
_atom_site_aniso.label
_atom_site_aniso.b_11
_atom_site_aniso.b_12
_atom_site_aniso.b_13
_atom_site_aniso.b_22
_atom_site_aniso.b_23
_atom_site_aniso.b_33
A Mn1 0.2 0.05 0.08 0.2 0.03 0.2
That example does demonstrate exactly how I imagine this working.
I suggest that we include the example given in (https://github.com/COMCIFS/MultiBlock_Dictionary/pull/6#issuecomment-1764741476) as distinct CIF files in the PR. This will definitely be useful, since people are already asking for usage examples.
Furthermore, I have a comment on the following statement given in the example :
_space_group.id 1 #Must be unique. Can be the same if representing the same space group in the same setting
I think that having the same setting if not sufficient. For space groups to have the same identifier, their symmetry operations in the SPACE_GROUP
loop must be listed with the same symop ids since these ids are later on used to specify symmetry operations in data items like _geom_bond.site_symmetry_1
.
Consider the following example:
data_merged
# ...
loop_
_space_group.id
_space_group.name_H-M_alt
1 'P 1 21/m 1'
2 'P 1 21/m 1'
# ...
loop_
_space_group_symop.space_group_id
_space_group_symop.id
_space_group_symop.operation_xyz
1 1 x,y,z
1 2 -x,y+1/2,-z
1 3 -x,-y,-z
1 4 x,-y+1/2,z
2 1 x,y,z
2 2 -x,y+1/2,-z
2 3 x,-y+1/2,z
2 4 -x,-y,-z
loop_
_geom_bond.atom_site_label_1
_geom_bond.atom_site_label_2
_geom_bond.space_group_id # not currently defined
_geom_bond.site_symmetry_1
_geom_bond.site_symmetry_2
_geom_bond.distance
C2 C3 1 1_555 3_555 1.44
Semantically, the two space groups are identical (same name, same number, same setting, same symmetry operations), but due to the different ids assigned to the symops, they have to retain distinct ids.
Furthermore, _geom_bond.space_group_id
in the GEOM_BOND
loop should probably be replaced by _geom_bond.structure_id
, but this assumes, that the proper structure-to-space-group relationship is defined in the STRUCTURE
loop.
I do not think that we can achieve a more elegant solution in the constraints of the relational model since items like _geom_bond.structure_id
prevent normalisation, but we need to be sure to properly communicate such gotchas to the users. Maybe it would make sense to describe the criteria required for two space groups to share the same space group id the definition of the _space_group.id
data item?
That sounds doable. A space group is the same iff it has the same name, number, setting, and symops in the same order. I can add this to the category description.
I am also put together some example structures and multi block cifs.
I agree that _geom_*.structure_id
is the correct key to add, as the GEOM
category is described as giving model information about the structure.
This also brings up the point as to the correct key for MODEL
; should it be _model.structure_id
? Should GEOM*
have a .model_id
instead as a key? At this point in time, I don't think so, as (iirc) MODEL
is empty, but if we add not refinement things, it my become not empty.
As mentioned in a comment to #3,
_cell.diffrn_id
and_cell_measurement.diffrn_id
should stay but are no longer key data names.
They already exist in core. The multiblock just alters the key dataname. I take this to mean that they remain in the dictionary, so no need to redefine them?
As mentioned in a comment to #3,
_cell.diffrn_id
and_cell_measurement.diffrn_id
should stay but are no longer key data names.They already exist in core. The multiblock just alters the key dataname. I take this to mean that they remain in the dictionary, so no need to redefine them?
I think that they should be moved to the multiblock dictionary, after which they will be removed from the core dictionary. This is because these data names have no use in the single-data-block paradigm (you can't refer to a _diffrn.id
that is not the same as the current data block).
That sounds doable. A space group is the same iff it has the same name, number, setting, and symops in the same order. I can add this to the category description.
Note that this is (mostly) just a particular case of the general rule that "if you repeat key data name values in different blocks, the rest of the values in the row must be identical". I don't think it warrants special mention in the category definition, but is worth pointing out to programmers as it is a useful consistency check. space_group_symop
is a little special, because we have to autogenerate the symop numbers for legacy files. Anybody writing software now (and thus reading the dictionary) will provide symop ids, so I doubt mentioning this in the core dictionary will do anything except confuse readers.
And I'd say that a space group is the same if the items in the space_group
category are the same for the same values of the key data name. That is, changing the order of the symops does not change the space group (just like in real life). What is does is change the identity of a symop that belongs to the space group.
Note that this is (mostly) just a particular case of the general rule that "if you repeat key data name values in different blocks, the rest of the values in the row must be identical". I don't think it warrants special mention in the category definition, but is worth pointing out to programmers as it is a useful consistency check.
space_group_symop
is a little special, because we have to autogenerate the symop numbers for legacy files. Anybody writing software now (and thus reading the dictionary) will provide symop ids, so I doubt mentioning this in the core dictionary will do anything except confuse readers.
I haven't put it in the category description; it's in the _space_group.id
description.
And I'd say that a space group is the same if the items in the
space_group
category are the same for the same values of the key data name. That is, changing the order of the symops does not change the space group (just like in real life). What is does is change the identity of a symop that belongs to the space group.
I agree that changing the order of the symops doesn't change the symmetry, but it does change how the symmetry is represented. As @vaitkus pointed out, having different _geom_*.symmetry_*
values pointing to different symop id values requires that the symop indicated by that id be the same, and so it doesn't matter the order, as long as the rows have the correct id in the loop, and hence I think it is worth pointing out. It is a gotcha.
An analysis of implications of the new STRUCTURE
category with other dictionaries. I want to understand what we are creating for them by adding linked data names to categories that they also modify.
Atom_Site
: A subsystem id and flags for the type of modulation wave a particular site is additionally modified by are added, but the modulation information is in a different category
b. cell
: modulation information is added in this and other categories
c. space group
: superspace group information also added, same for space_group_symop
etc.A number of additional categories are defined that provide more modulation information. None of these have been made formal children of the above categories (yet) and so can be ignored - it is up to the ms_dic people to decide if they want to do more.
The cell_subsystem
category is interesting, as it adopts the same approach as we have with multi blocks. However, only the atom_site
category explicitly contains a pointer to _cell_subsystem.code
.
Conclusion: 'Structure' as we have defined it (atom_site
+ cell
+ space_group
) will partially describe a superspace structure. ms_dic
can completely describe a structure (in the same sense as core CIF is complete) by adding the appropriate links to _structure.id
in the appropriate categories. A single structure will include all subsystems.
atom_site_moment
category is a formal child of atom_site
, meaning that atomic moments are currently incorporated into our concept of structure
.Here we come to an important point: we can imagine a magnetic_structure
category that fulfills the same role as the structure
category, but for magnetism. Such a category would have a pointer to _structure.id
, and then include the magnetic space group and magnetism-specific modulation waves. What concerns me (slightly) is that the moments really belong only to the magnetic structure but are swept up into the structure overall. That is a consequence of wanting to list the moments in a single loop with the atomic positions, so in that sense the decision has already been made. An alternative to a separate magnetic structure category is simply for the magnetic structure dictionary to expand the concept of structure and add magnetic information to the structure
category.
The magnetism dictionary also has the idea of a parent space group, which is a non-magnetic structure that the magnetic structure is related to. This is a good use for a pointer to _structure.id
Conclusion : the structure category is a slight misnomer in the case of magnetism, but does not create practical problems. If we want to be more universal in our naming, we could change the name to something like STRUCTURAL_MODEL
I think this PR is ready for incorporation into the multi block dictionary following the above suggested changes and perhaps a change of the category from STRUCTURE
to something like STRUCTURAL_MODEL
to try to convey the more general meaning. Note that version 1.0.0 of the multi-block dictionary is now available from the IUCr website.
Finally merged. The examples may need improvement.
Will close #3
I redid the PR, as I wanted to work through things in my head.
There are now:
STRUCTURE
id
space_group_id
diffrn_id
ATOM_SITE
,ATOM_SITE_ANISO
,CELL
,CELL_MEASUREMENT
, andCELL_MEASUREMENT_REFLN
structure_id
SPACE_GROUP
id
SPACE_GROUP_GENERATOR
,SPACE_GROUP_SYMOP
, andSPACE_GROUP_WYCKOFF
space_group_id
cell.diffrn_id
andcell_measurement.diffrn_id
have been removed.Modulated and magnetic structures need to be looked at.