COMCIFS / Powder_Dictionary

CIF definitions for powder diffraction
4 stars 4 forks source link

Add PD_PHASE_LIST and PD_DIFFRACTOGRAM_LIST. #117

Closed rowlesmr closed 1 year ago

rowlesmr commented 1 year ago

When the new _pd_phase.id and PD_DIFFRACTOGRAM were created, we also made a new PD_PHASE_BLOCK category as Loop in order to retain the ability allow block IDs to be looped. We then also added added _pd_phase_block.phase_id and _pd_block_diffractogram.diffractogram_id to allow phase and diffractogram IDs to be looped.

I think this last decision was wrong, and conflates the idea of block IDs and phase/diffractogram IDs. This commit removes _pd_phase_block.phase_id and _pd_block_diffractogram.diffractogram_id and adds PD_PHASE_LIST and PD_DIFFRACTOGRAM_LIST to allow _pd_phase.id and _pd_diffractogram.id values to be looped.

See also #74.

jamesrhester commented 1 year ago

I agree with the decision to drop the diffractogram and phase pointers, as they provide redundant information anyway. What I don't understand is why there is any need for the *_list data names, can you give an example of where they would be useful?

rowlesmr commented 1 year ago

The new definition of _pd_phase.id is very different to the original meaning of _pd_phase_id, which was really just a link between phase block ids and reflections.

Previously, you would write something like:

#/#CIF_1.1
_pd_block_id  THIS_IS_A_UNIQUE_DIFFRACTOGRAM_ID

loop_
_pd_phase_id
_pd_phase_block_id
_pd_phase_mass_%
1   A_BIG_LONG_STRING_FOR_A_PHASE     45.1
2   ANOTHER_LONG_SEQUENCE_GOES_HERE   53.9

loop_
_refln_index_h
_refln_index_k
_refln_index_l
_pd_refln_phase_id
_refln_d_spacing
0    0   12   1   1.082702
1    0  -14   1   0.905365
2    2    0   2   1.920168
3    1    1   2   1.637525
#...

now, you need to write

#/#CIF_2.0
_pd_diffractogram.id   THIS_IS_A_UNIQUE_DIFFRACTOGRAM_ID

loop_
_pd_phase_mass.phase_id
_pd_phase_mass.percent
A_BIG_LONG_STRING_FOR_A_PHASE     45.1
ANOTHER_LONG_SEQUENCE_GOES_HERE   53.9

loop_
_refln.index_h
_refln.index_k
_refln.index_l
_pd_refln.phase_id
_refln.d_spacing
0    0   12   A_BIG_LONG_STRING_FOR_A_PHASE   1.082702
1    0  -14   A_BIG_LONG_STRING_FOR_A_PHASE   0.905365
2    2    0   ANOTHER_LONG_SEQUENCE_GOES_HERE   1.920168
3    1    1   ANOTHER_LONG_SEQUENCE_GOES_HERE   1.637525
#...

if you give the _pd_phase.id values the same as the old block ids.

.

The idea with the *_list data names is to (hopefully) enable something like

#/#CIF_2.0
_pd_diffractogram.id  THIS_IS_A_UNIQUE_DIFFRACTOGRAM_ID

loop_
_pd_phase_list.id
_pd_phase_list.phase_id
1   A_BIG_LONG_STRING_FOR_A_PHASE
2   ANOTHER_LONG_SEQUENCE_GOES_HERE

loop_
_pd_phase_mass.phase_list_id
_pd_phase_mass.percent
1   45.1
2   53.9

loop_
_refln.index_h
_refln.index_k
_refln.index_l
_pd_refln.phase_list_id
_refln.d_spacing
0    0   12   1   1.082702
1    0  -14   1   0.905365
2    2    0   2   1.920168
3    1    1   2   1.637525
#...

to get rid of a lot of typing.

jamesrhester commented 1 year ago

Right. So what is the reason that we have to preserve the long forms of _pd.phase_id? Could we instead have a data name in pd_phase like pd_phase.full_name for cases where we want to record super long names?

rowlesmr commented 1 year ago

If you're give a phase a globally (as in the entire world) unique name, then it is (potentially) going to be long. It would be handy to be able to refer to it by a short, block-scope tag. pd_phase.full_name is essentially _pd_block_id but without the linking.

In an insitu data set, I would like to retain the ability to refer to phase 1 in a particular diffractogram, even if the phase to which it refers changes. You then link 1 to a _pd_phase.id value in that diffractogram block, and you're all good.

.

In 1.1, _pd_block_id is a unique identifier across all data blocks in a particular CIF file/container. _pd_phase_id is only required to be unique within a single block, as it links a peak listing to a block id, both of which are contained in that single block.

In 2.0, we have _pd_phase.id, as a unique identifier across all data blocks in a particular CIF file/container, but we no longer have a data item that is only required to be unique within a single block with which we can link peaks (and quant and other things) to a _pd_phase.id. Is it even possible to have the same data item present in different blocks with the same value and not clash? What I really want is a way to have a data item have block-scope.

.

In this world, PD_PHASE_LIST (and the *DIFFACTOGRAM* equivalent) has two members, _pd_phase_list.id and _pd_phase_list.phase_id, both of which are category keys.

rowlesmr commented 1 year ago

I no longer believe that having PD_*_LIST is a good idea. It will double up on dataitems in too many categories.

We still need to remove _pd_phase_block.phase_id and _pd_block_diffractogram.diffractogram_id, though.