COMCIFS / Powder_Dictionary

CIF definitions for powder diffraction

4 stars 4 forks source link

Addition of preferred orientation #9

Closed rowlesmr closed 1 year ago

rowlesmr commented 2 years ago

Idea for the addition of preferred orientation parameters to pdCIF.

To my knowledge, there are two main methodologies for modelling preferred orientation (PO) in powder data: March-Dollase (MD), and spherical harmonics (SH).

In both cases, a geometrical factor must be known to properly apply the corrections. More on that at the end.

This would extend the existing _pd_proc_ls.pref_orient_corr by providing specialisations for MD and SH, requiring any other implementations to conform with the existing specification.

Note: I know that all the tag names are currently terrible.

March-Dollase

See refs 1,2,3

When applying MD, there are two required values: the amount of orientation, typically referred to with r, and the direction of the orientation (the direction is often a cleavage plane). There can be multiple PO directions, in which case, each has their own r and hkl, and there is an additional fractional contribution of each direction to the overall PO correction. In a refinement, hkl is defined, and r is refined. If there is more than one direction, the the fractional contribution of each direction is also refined.

A first suggestion: _pd_po_md_r – the March-Dollase r factor. Float, (0, infty) , default value = 1.0 _pd_po_md_h – the h index of the orientation direction. Int, (-infty, -infty) _pd_po_md_k – the k index of the orientation direction. Int, (-infty, -infty) _pd_po_md_l – the l index of the orientation direction. Int, (-infty, -infty) _pd_po_md_fract – in the case of multiple directions, the amount that is in that direction. Float, (0, infty), default value = 1.0. The sum of all values for a single structure must be 1. _pd_po_md_id – in the case of multiple directions, an ID number to uniquely id a thing.

To be used as:

_pd_po_md_r   0.789
_pd_po_md_h  1
_pd_po_md_k  1
_pd_po_md_l   0

loop_
_pd_po_md_r  
_pd_po_md_h
_pd_po_md_k
_pd_po_md_l
0.789     1              1              0

loop_
_pd_po_md_id
_pd_po_md_r  
_pd_po_md_h
_pd_po_md_k
_pd_po_md_l
_pd_po_md_frac
1     0.789     1              1              0              0.62
2     1.243     1              0              1              0.38

Spherical-Harmonics

See ref 4

When applying SH, there are many required values, depending on the Laue class*. From Jarvinen [4], only even orders are used. Each value is a refinable parameter and must be reported. There are a couple of ways of giving the values: (the notation I'm using follows Jarvinen and TOPAS).

Note: SHs can also be used for anisotropic crystallite size. so setting up the tag names so that changing to crystal size would still look largely the same would probably be best.

_pd_po_sh_y_i - order of the SH; even integer, [0, infty) _pd_po_sh_y_j - terms of each order: integer, [0, i] _pd_po_sh_y_p - parity: 0, +1, -1 (can only be 0 for the 0th term of each order, otherwise, +/- 1) _pd_po_sh_cijp - the actual refined value. Float (-infty, infty)

Also, as a separate item, _pd_po_sh_texture_index

The texture index is defined by Bunge (ref 5; eqn 4.212) as
\sum{i,j,p}\frac{1}{2i+1}c{ijp}^2)

It gives the value [1, infty), where 1 is a random powder and infty is an ideal single crystal. This is a good summary statistic to show how orientated a phase is.

To be used as:

loop_
_pd_po_sh_y_i 
_pd_po_sh_y_j 
_pd_po_sh_y_p 
_pd_po_sh_cijp
0  0  0  1  #if the value of 000 isn't 1, then everything is suspect, as it means it isn't properly normalised
2  0  0  0.231
2  1  -1  0.243
2  1  1   -0.134
...

or, you could define a tag for each. This has the downside of limiting the combinations of ijp to those specified

_pd_po_sh_y00 - default value 1. If not 1, then it is suspect. _pd_po_sh_y20 – float (-infty, infty) _pd_po_sh_y21p – and so on…. _pd_po_sh_y21m _pd_po_sh_y22p _pd_po_sh_y22m _pd_po_sh_y40 _pd_po_sh_y41p _pd_po_sh_y41m _pd_po_sh_y42p _pd_po_sh_y42m _pd_po_sh_y44p _pd_po_sh_y44m _pd_po_sh_y60 _pd_po_sh_y61p _pd_po_sh_y61m _pd_po_sh_y62p _pd_po_sh_y62m _pd_po_sh_y64p _pd_po_sh_y64m _pd_po_sh_y66p _pd_po_sh_y66m _pd_po_sh_y80 _pd_po_sh_y81p _pd_po_sh_y81m _pd_po_sh_y82p _pd_po_sh_y82m _pd_po_sh_y84p _pd_po_sh_y84m _pd_po_sh_y86p _pd_po_sh_y86m _pd_po_sh_y88p _pd_po_sh_y88m

this will have to be a part of checkcif to ensure that only the allowed values are given

Geometrical considerations

The exact implementation of the PO correction depends on the instrument geometry, or probably more to the point, the defaults in the program used to do the modeling of the PO. The (probable) default for (most?) programs would be symmetric reflection geometry. The other possible geometries are asymmetric reflection, symmetric transmission, asymmetric transmission, and capillary. (see [1], [2], [4])

(the sample normal is taken as the rotation axis of the sample, hence the difference between capillary and asymmetric transmission)

There are some diagrams in [1] showing these different arrangements. They are also in Table 1 [4] and the first two are discussed in [2].

It is suggested to have a tag:

_pd_po_geom

with possible values: "refl": symmetric reflection "trans": symmetric transmission "arefl": asymmetric reflection "atrans": asymmetric transmission "cap": capillary, or Debye-Scherrer

If this tag is not supplied, then "refl" is assumed.

References

1 Rowles and Buckley 2017, §3, https://espace.curtin.edu.au/handle/20.500.11937/50104 [espace.curtin.edu.au] 2 Ida, 2013 https://www.researchgate.net/publication/271325583_Effect_of_Preferred_Orientation_in_Synchrotron_X-ray_Powder_Diffraction [researchgate.net] 3 Dollase 1986 https://doi.org/10.1107/S0021889886089458 [doi.org] 4 Jarvinen, 1993 https://doi.org/10.1107/S0021889893001219 [doi.org] 5 Bunge, https://www.researchgate.net/publication/277308168_Texture_Analysis_in_Materials_Science_H-J_Bunge [researchgate.net]

jamesrhester commented 2 years ago

I think this should work.

I don't see any problems with the _pd_po_md outline. I think it is true that all of these items are per phase, so there would be a child data name of _pd_phase.id in there too. As we are separating phase-specific information into separate data blocks, this data name would generally be invisible.
Likewise with the _pd_po_sh proposal, and definitely prefer the y_i, y_j, y_p forms as they make the indices more immediately available to calculations.
_pd_po_geom seems like it is not just specific to preferred orientation but something that probably belongs in PD_INSTR. _pd_instr.geometry is one of those old data names that does not restrict the geometries to a specific list, so how about we add a new data name like _pd_instr.reflection_geometry which would have values as suggested above?
Yes, the data names will probably become _pd_preferred_orient_March-Dollase.* and ...._sphericalharmonics.*

rowlesmr commented 2 years ago

Yes these are per phase. So in a multiphase environment, with PO on multiple phases, with a combination of MD and SH, something like the CIF down below.

With _pd_po_sh_phase_block_id _pd_po_md_phase_block_id - I envisage it operating like the reflection markers. There is a code in the PO listing, and that code is indexed by the _pd_phase.id to a _pd_phase.block_id

The individual tags were my first idea, and then I thought of the y_i formulism. I think it's much neater.
_pd_po_geom is only a PD_INSTR thing if the analysis respects the instrument configuration. Right now, I don't think there is a way to do a SH PO correction in TOPAS that isn't in symmetric reflection, as the Legendre polynomials aren't exposed to the user. The same with MD, unless you write your own macros. So _pd_po_geom expresses the geometry of the correction rather than the geometry of the instrument. This is discussed in terms of MD by Howard and Kisi looking at capillaries.

data_aDiffractionPattern
_pd_block.id diffpat

loop_
_pd_phase.id
_pd_phase.block_id
_pd_phase.mass_percent
_pd_po_sh_texture_index
1   kaolin       20.3   .
2   goethite   10.4   .
3   albite        69.3   2.345

loop_
_pd_po_md_r
_pd_po_md_h
_pd_po_md_k
_pd_po_md_l
_pd_po_md_fract
_pd_po_md_phase_block_id
0.789   0   0   1   1   1
0.902   0   1   0   0.68   2
0.741   1   0   1   0.32   2

loop_
_pd_po_sh_y_i 
_pd_po_sh_y_j 
_pd_po_sh_y_p 
_pd_po_sh_cijp
_pd_po_sh_phase_block_id
0  0  0  1     3
2  0  0  0.231   3
2  1  -1  0.243   3
2  1  1   -0.134   3
#...

loop_
_pd_meas.2theta_scan
_pd_meas.counts_total
_pd_calc.intensity_net
5.00    1231    1024.212
5.10    1254    1024.212
#...

jamesrhester commented 2 years ago

Sounds good. Is there any situation where different histograms collected from the same sample and forming part of the same overall dataset might have different PO corrections? I'm guessing yes, in which case histogram_id is also a "hidden key data name". Not that that's a problem for the proposal.
Looks like at least two of us agree
Fair point. But it does sound like it wouldn't hurt to also define _pd_instr.reflection_geometry and then a future checkCIF for powder data could compare the two and raise an alert which the referee could then ignore/complain about?

rowlesmr commented 2 years ago

Yes. I could see a flat plate reflection and capillary dataset being collected on the same sample.
\o/
sounds OK. You'd need to cover symmetric and asymmetric transmission and reflection, and capillary geometries.

briantoby commented 2 years ago

On point 1 above, I would argue that preferred orientation, crystallite size and microstrain are not phase or dataset properties, but are actually functions of both. In GSAS-II we effectively “loop” them by histogram as part of the phase information, but that could have been reversed. The GUI by default shows them as part of the phase information, but I consider that in retrospect a poor choice and have put an option that displays them as a “top-level” tree item.

For spherical harmonics, I think that even though the number of terms may differ between datasets, at least the labeling for the terms is consistent since space group is the same for all. This means they can be looped by dataset within a phase in CIF.

rowlesmr commented 2 years ago

Yes, I can see that PO is a combination of phase and dataset, with PO being observed in the diffraction pattern, and the parameters describing it constrained by the structure. In a multi-phase, multi-dataset environment, I would think that the PO "belongs" to the diffraction pattern, as PO is an extrinsic property of the phases within the specimen, whereas something like texture or microstructure belongs with the phase only, as they are an intrinsic property of that phase.

In TOPAS, the dataset "owns" the structure, and the structure "owns" the PO, which sounds the same as GSAS-II.

Couldn't both SH & MD PO could also be looped by dataset within a single structure? Is this something like you're thinking of:

data_aCrystalStructure
_pd_block.id xstal

loop_
_pd_block_diffractogram.id
   diffpat1
   diffpat2

loop_
_pd_po_md_r
_pd_po_md_h
_pd_po_md_k
_pd_po_md_l
_pd_po_md_diffractogram_block_id
0.789   0   0   1    diffpat1
0.902   0   0   1    diffpat2

_cell.length_a 12.34
#...

briantoby commented 2 years ago

In GSAS-II experimental data and phases are linked and phases own the links as well as the “HAP” (histogram and phase) parameters, but I don’t think this choice matters very much. Since MD texture is just two terms (direction & coefficient) regardless of phase symmetry that can be looped under phase or dataset. Looping SH is more complex as the number of terms and the associated equation for each term depend on the selected order and the symmetry for each phase. I think it would be very messy to put SH terms into a loop for different phases. Less messy (I think) if one collects SH terms for one phase but differing datasets, even if the SH order is changed between datasets.

On Sep 17, 2022, at 8:31 AM, Matthew Rowles @.**@.>> wrote:

In TOPAS, the dataset "owns" the structure, and the structure "owns" the PO, which sounds the same as GSAS-II.

Couldn't both SH & MD PO could also be looped by dataset within a single structure? Is this something like you're thinking of:

data_aCrystalStructure _pd_block.idhttp://pd_block.id xstal

loop_ _pd_block_diffractogram.idhttp://pd_block_diffractogram.id diffpat1 diffpat2

loop_ _pd_po_md_r _pd_po_md_h _pd_po_md_k _pd_po_md_l _pd_po_md_diffractogram_block_id 0.789 0 0 1 diffpat1 0.902 0 0 1 diffpat2

_cell.length_a 12.34

...

— Reply to this email directly, view it on GitHubhttps://github.com/COMCIFS/Powder_Dictionary/issues/9#issuecomment-1250072322, or unsubscribehttps://github.com/notifications/unsubscribe-auth/ACH7E2ELVCQCEMXDUGUZMETV6XB4TANCNFSM6AAAAAAQL6XIPU. You are receiving this because you commented.Message ID: @.***>

jamesrhester commented 2 years ago

We can separate this into two decisions. The first decision is whether or not PO depends on diffractogram (to use pdcif terminology). From the above discussion, it does. Therefore a child data name of _pd_diffractogram.id must be defined for the PO loop. Note that _pd_diffractogram.id has not yet been defined as far as I can tell, and should always be assumed equal to _pd_block.id. In typical situations this will be an invisible data name (ie not explicitly required anywhere), but is necessary for automatic consolidation of multi-block datasets by software that doesn't understand the powder-specific block_id pointers. The block-id pointers can still be used of course. So this first decision causes behind-the-scenes action and sets up options for the second decision below.

The second decision is regarding how to recommend presenting PO information. The "default" presentation of powder data that is compatible with core CIF is to put histogram-specific and phase-specific information into separate blocks, that is, information that is both histogram and phase specific goes into N x M data blocks for N histograms and M phases, information that is only phase specific goes into a further M data blocks, and information that is only histogram specific goes into a further N data blocks (if both N and M = 1 then everything can go in a single data block). So for PO, which depends on both histogram and phase, PO info for each histogram and phase goes into a separate data block. This "default" presentation is essentially created by making the pd_diffractogram category (which again, doesn't yet exist) a Set category.

Now, as I've said elsewhere the data block split I've described above which is implied by the "default" core CIF approach may be varied by the powder community if they want to recommend something more sensible, if there is something more sensible. But for the purposes of getting an update to the powder CIF for preferred orientation, only decision 1 is relevant and the best way to present it can be decided after the dictionary is updated.

So I think if @rowlesmr is ready, a pull request could be developed for PO, perhaps after waiting a polite interval for further comments.

rowlesmr commented 2 years ago

I'm thinking about it :)

I'm just writing up examples of different layouts to see what makes sense and stuff...

rowlesmr commented 2 years ago

Just trying to get it all clear in my head how all the datanames relate to each other, and how you would actually write a CIF with that information.

Assuming 2 histograms (DP1, DP2) and 3 phases (PH1, PH2, PH3), 1 of which (PH3) has no PO, then I think this is how the CIF file would be laid out following either:

The way (I interpret what James said on how) to arrange histograms, phases, and hist/phase data according to CIF rules,
How I interpreted what Brian said about looping by diffraction pattern inside each phase, or
looping by phase inside each diffraction pattern (me).

In the first case, each histogram/phase datablock knows about the phases/histogram that refer to each other, but I'm a little lost on how they know about the datablocks containing the PO information. (unless I've misinterpreted how they link). The PO blocks know about the phases and histograms, but none of those point to the PO blocks.

# N histograms + M phases + N*M histogram/phase datablocks
data_DP1
_pd_block.id DP1
loop_ #diffraction pattern knows about its phases
_pd_phase.id
_pd_phase.block_id
1   PH1
2   PH2
3   PH3
loop
_pd_meas.2theta_scan
_pd_meas.counts_total
_pd_calc.intensity_total
5.00    1024    1024.321
5.10    1065    1025.542
#...

#----------------------------
data_DP2
_pd_block.id DP2
loop_
_pd_phase.id
_pd_phase.block_id
1   PH1
2   PH2
3   PH3
loop
_pd_meas.2theta_scan
_pd_meas.counts_total
_pd_calc.intensity_total
5.00    1024    1024.321
5.10    1065    1025.542
#...

#----------------------------
data_PH1
_pd_block.id    PH1
loop_ #phase knows about its diffraction patterns
_pd_block_diffractogram.id
DP1
DP2
_cell.length_a 12.34
#..

#----------------------------
data_PH2
_pd_block.id    PH2
loop_
_pd_block_diffractogram.id
DP1
DP2
_cell.length_a 4.56
#..

#----------------------------
data_PH3
_pd_block.id    PH3
loop_
_pd_block_diffractogram.id
DP1
DP2
_cell.length_a 7.89
#..

#----------------------------
data_DP1_PH1 #how does the phase and diffraction pattern know this exists?
_pd_block.id DP1_PH1
_pd_block_diffractogram.id DP1
_pd_phase.block_id PH1
_pd_po_md_r 0.789
_pd_po_md_h 1
_pd_po_md_k 0
_pd_po_md_l 0
_pd_po_geom capillary

#----------------------------
data_DP2_PH1
_pd_block.id DP2_PH1
_pd_block_diffractogram.id DP2
_pd_phase.block_id PH1
loop_
_pd_po_md_r 
_pd_po_md_h 
_pd_po_md_k 
_pd_po_md_l 
_pd_po_md_fract
0.654 1 0 0 0.88
0.976 0 1 0 0.12

#----------------------------
data_DP1_PH2
_pd_block.id DP1_PH2
_pd_block_diffractogram.id DP1
_pd_phase.block_id PH2
_pd_po_md_r 0.456
_pd_po_md_h 1
_pd_po_md_k 1
_pd_po_md_l 0
_pd_po_geom capillary

#----------------------------
data_DP2_PH2
_pd_block.id DP2_PH2
_pd_block_diffractogram.id DP2
_pd_phase.block_id PH2
loop_
_pd_po_sh_y_i 
_pd_po_sh_y_j 
_pd_po_sh_y_p 
_pd_po_sh_cijp
0 0  0   1
4 1 -1   0.789
4 1  1  -0.72
6 1 -1   0.124
6 1 +1  -0.64
6 2 -1  -0.451
6 2 +1  -0.641
8 1 -1   0.1545
8 1 +1   0.451

# Looping PO information by histogram, within each phase

data_DP1
_pd_block.id DP1
loop_
_pd_phase.id
_pd_phase.block_id
1   PH1
2   PH2
3   PH3
loop
_pd_meas.2theta_scan
_pd_meas.counts_total
_pd_calc.intensity_total
5.00    1024    1024.321
5.10    1065    1025.542
#...

#----------------------------
data_DP2
_pd_block.id DP2
loop_
_pd_phase.id
_pd_phase.block_id
1   PH1
2   PH2
3   PH3
loop
_pd_meas.2theta_scan
_pd_meas.counts_total
_pd_calc.intensity_total
5.00    1024    1024.321
5.10    1065    1025.542
#...

#----------------------------
data_PH1
_pd_block.id    PH1
loop_
_pd_block_diffractogram.short_id
_pd_block_diffractogram.id
1   DP1
2   DP2
loop_
_pd_po_md_r 
_pd_po_md_h 
_pd_po_md_k 
_pd_po_md_l 
_pd_po_md_fract
_pd_po_geom 
_pd_po_md_diffractogram_id
0.654 1 0 0 0.88 .         2
0.976 0 1 0 0.12 .         2
0.789 1 0 0 1    capillary 1

_cell.length_a 12.34
#..

#----------------------------
data_PH2
_pd_block.id    PH2
loop_
_pd_block_diffractogram.short_id
_pd_block_diffractogram.id
1   DP1
2   DP2

loop_
_pd_po_sh_y_i 
_pd_po_sh_y_j 
_pd_po_sh_y_p 
_pd_po_sh_cijp
_pd_po_sh_geom  #need to have a different geom tag 
                # for MD and SH, as they would exist 
                # in different loops, and may have 
                # different values.
_pd_po_md_diffractogram_id
0 0 0    1       symmetric_reflection 2
4 1 -1   0.789   symmetric_reflection 2
4 1  1  -0.72    symmetric_reflection 2
6 1 -1   0.124   symmetric_reflection 2
6 1 +1  -0.64    symmetric_reflection 2
6 2 -1  -0.451   symmetric_reflection 2
6 2 +1  -0.641   symmetric_reflection 2
8 1 -1   0.1545  symmetric_reflection 2
8 1 +1   0.451   symmetric_reflection 2

loop_
_pd_po_md_r 
_pd_po_md_h 
_pd_po_md_k 
_pd_po_md_l 
_pd_po_md_geom 
_pd_po_md_diffractogram_id
0.456 1 1 0 capillary 1

_cell.length_a 4.56
#..

#----------------------------
data_PH3
_pd_block.id    PH3
loop_
_pd_block_diffractogram.id
DP1
DP2
_cell.length_a 7.89
#..

# Looping PO information by phase, within each diffractogram

data_DP1
_pd_block.id DP1
loop_
_pd_phase.id
_pd_phase.block_id
1   PH1
2   PH2
3   PH3

loop_
_pd_po_md_r 
_pd_po_md_h 
_pd_po_md_k 
_pd_po_md_l 
_pd_po_md_geom 
_pd_po_md_phase_id
0.789 1 0 0 capillary 1
0.456 1 1 0 capillary 2

loop
_pd_meas.2theta_scan
_pd_meas.counts_total
_pd_calc.intensity_total
5.00    1024    1024.321
5.10    1065    1025.542
#...

#----------------------------
data_DP2
_pd_block.id DP2
loop_
_pd_phase.id
_pd_phase.block_id
1   PH1
2   PH2
3   PH3

loop_
_pd_po_md_r 
_pd_po_md_h 
_pd_po_md_k 
_pd_po_md_l 
_pd_po_md_fract
_pd_po_md_phase_id 
0.654 1 0 0 0.88 1
0.976 0 1 0 0.12 1

loop_
_pd_po_sh_y_i 
_pd_po_sh_y_j 
_pd_po_sh_y_p 
_pd_po_sh_cijp
_pd_po_sh_phase_id
0 0  0   1     2
4 1 -1   0.789 2
4 1  1  -0.72  2
6 1 -1   0.124 2 
6 1 +1  -0.64  2
6 2 -1  -0.451 2
6 2 +1  -0.641 2
8 1 -1   0.154 2
8 1 +1   0.451 2

loop_
_pd_meas.2theta_scan
_pd_meas.counts_total
_pd_calc.intensity_total
5.00    1024    1024.321
5.10    1065    1025.542
#...

#----------------------------
data_PH1
_pd_block.id    PH1
loop_
_pd_block_diffractogram.id
DP1
DP2
_cell.length_a 12.34
#..

#----------------------------
data_PH2
_pd_block.id    PH2
loop_
_pd_block_diffractogram.id
DP1
DP2
_cell.length_a 4.56
#..

#----------------------------
data_PH3
_pd_block.id    PH3
loop_
_pd_block_diffractogram.id
DP1
DP2
_cell.length_a 7.89
#..

jamesrhester commented 2 years ago

Thanks @rowlesmr for taking the time to write these out. I'll focus on the first scenario ("default") as this must work across all CIF as the bare minimum for handling multi-block situations - or else it is back to the drawing board. I've inserted comments in your adapted example 1 below and removed data block pointers simply to show the basic situation more clearly.

The default CIF scheme assumes that a collection of data blocks has been provided and asserted to form a consistent dataset. How these blocks came to be aggregated is unspecified, and indeed it was not possible for COMCIFS to develop a satisfactory method for defining data block aggregates: I would link to the discussion but there is a problem with the list archive. Anyway, given this aggregate, and a dictionary , it is possible to rearrange all of the data blocks into a single data block with loops populated correctly.

Looking at the example below, drawn from your first example, the links between phase - diffractogram - PO information are derived as follows:

All phases are assumed to belong to the single specimen that is the subject of the dataset
Therefore there is no need to link diffractogram to phases as all diffractograms in the aggregate automatically correspond to all phases (all values of _pd_phase.id) found in the aggregate.
The particular phase a data block is relevant to is given by the value of _pd_phase.id in that block
Likewise for _pd_diffractogram.id (data name to be defined)
The value of _pd_diffractogram.id and _pd_phase.id in each PO data block identify the relevant phase and diffractogram.

Assumption (1) above may be too restrictive (up to us). If not true, we must introduce a _pd_spec.id, and pd_phase_spec.id would be added to the PD_PHASE category which would then be used to list which phases are in which specimen, and then a diffractogram block would include a value of _pd_spec.id matching the particular specimen in comes from. Now would be a good time to decide if we want this flexibility, if it needs discussion best to raise it as a separate issue.

Again, the system of data block pointers that pdCIF introduced (and msCIF also uses) can co-exist with this "default" scheme and perhaps provide extra confidence that blocks belong together.

Regarding the other alternatives presented above, I'd be wary about trying to group PO with phase or PO with diffractogram. As we've established, PO is a property of both phase and diffractogram and there is no obvious reason to give either of those primacy. If the lack of explicit pdCIF-type links between PO data blocks and the rest is unacceptable then you might as well simply create general data block pointers - probably most efficient is simply a table listing data block ids and the role of that data block, as msCIF does. Then adding a PO data block is as easy as a single new item for an enumerated list:

_pd_other_block.id
_pd_other_block.role
1 phase
2 phase
3 diffractogram
4 PO

Edited example 1 from @rowlesmr above follows:

# N histograms + M phases + N*M histogram/phase datablocks
data_DP1
_pd_block.id DP1   #This will also be _pd_diffractogram.id

# List of block ids for phases has been removed; all blocks that
# are presented together are assumed to belong to the same
# dataset, and a single specimen must contain all 
# phases, that would seem to be a fundamental assumption
# of belonging to the same dataset

loop
_pd_meas.2theta_scan
_pd_meas.counts_total
_pd_calc.intensity_total
7.00    1024    1024.321
5.10    1065    1025.542
#...

#----------------------------
data_DP2
_pd_block.id DP2

loop
_pd_meas.2theta_scan
_pd_meas.counts_total
_pd_calc.intensity_total
5.00    1024    1024.321
5.10    1065    1025.542
#...

#----------------------------
data_PH1
_pd_block.id    PH1
_pd_phase.id  PH1

# Link to diffractograms removed; we assume that all diffractograms are from the same specimen
# therefore all diffractograms have the same phases

_cell.length_a 12.34
#..

#----------------------------
data_PH2
_pd_block.id    PH2
_pd_phase.id  PH2

_cell.length_a 4.56
#..

#----------------------------
data_PH3
_pd_block.id    PH3
_pd_phase.id    PH3
loop_

_cell.length_a 7.89
#..

#----------------------------
data_DP1_PH1 #how does the phase and diffraction pattern know this exists?

# It knows because
# 1. It has been presented together with the other data blocks
# 2. The pd_phase.id and pd_diffractogram.id match values found in other data blocks
#
# More importantly, the phase and diffraction data blocks do not have to
# "know" what else exists, only the software reading in the whole aggregate
# needs to know.
#
# I have removed block pointers below
# But left block identifiers 
#
_pd_block.id DP1_PH1
_pd_diffractogram.id DP1
_pd_phase.id PH1
_pd_po_md_r 0.789
_pd_po_md_h 1
_pd_po_md_k 0
_pd_po_md_l 0
_pd_po_geom capillary

#----------------------------
data_DP2_PH1
_pd_block.id DP2_PH1
_pd_diffractogram.id DP2
_pd_phase.id PH1
loop_
_pd_po_md_r 
_pd_po_md_h 
_pd_po_md_k 
_pd_po_md_l 
_pd_po_md_fract
0.654 1 0 0 0.88
0.976 0 1 0 0.12

#----------------------------
data_DP1_PH2
_pd_block.id DP1_PH2
_pd_diffractogram.id DP1
_pd_phase.id PH2
_pd_po_md_r 0.456
_pd_po_md_h 1
_pd_po_md_k 1
_pd_po_md_l 0
_pd_po_geom capillary

#----------------------------
data_DP2_PH2
_pd_block.id DP2_PH2
_pd_diffractogram.id DP2
_pd_phase.id PH2
loop_
_pd_po_sh_y_i 
_pd_po_sh_y_j 
_pd_po_sh_y_p 
_pd_po_sh_cijp
0 0  0   1
4 1 -1   0.789
4 1  1  -0.72
6 1 -1   0.124
6 1 +1  -0.64
6 2 -1  -0.451
6 2 +1  -0.641
8 1 -1   0.1545
8 1 +1   0.451

rowlesmr commented 2 years ago

All phases are assumed to belong to the single specimen that is the subject of the dataset

This depends on how you define "single specimen". What do you call a specimen that goes through a phase change, or reacts? All phases belong to the "single specimen", but not all phases are present in all histograms.

Or for that matter, a temperature-dependent experiment will have a different specimen per histogram, as the lattice parameters have changed.

briantoby commented 2 years ago

Option 1 with (N+1) x (M+1) blocks sounds really messy. FWIW, I think there is a mechanism for the hist/phase blocks to point to the parent phase and histograms, but not for the reverse.

For discussion, here are some SH terms as labeled in GSAS-II for a few different space groups, with different selected SH orders.

Fd-3m: 2: no terms 4: C(4,1) 6: C(4,1), C(6,1), 10: C(10,1), C(4,1), C(6,1), C(8,1)

R-3 (r or h setting) 2: C(2,0) 4: C(2,0), C(4,-3), C(4,0), C(4,3)

P 21 21 21 2: C(2,0), C(2,2) 4: C(2,0), C(2,2), C(4,0), C(4,2), C(4,4)

I tend to think of loops as creating tables and since the terms used in a phase will share “labels” it seems cleaner to me to group them that way as it creates a more compact grid, but if one considers a loop as just a list of entries then options 2 & 3 are pretty much interchangeable. As long as the standard picks one.

Brian

On Sep 19, 2022, at 7:48 AM, Matthew Rowles @.**@.>> wrote:

Just trying to get it all clear in my head how all the datanames relate to each other, and how you would actually write a CIF with that information.

Assuming 2 histograms (DP1, DP2) and 3 phases (PH1, PH2, PH3), 1 of which (PH3) has no PO, then I think this is how the CIF file would be laid out following either:

The way (I interpret what James said on how) to arrange histograms, phases, and hist/phase data according to CIF rules,
How I interpreted what Brian said about looping by diffraction pattern inside each phase, or
looping by phase inside each diffraction pattern (me).

N histograms + M phases + N*M histogram/phase datablocks

data_DP1 _pd_block.idhttp://pd_block.id DP1 loop_ #diffraction pattern knows about its phases _pd_phase.idhttp://pd_phase.id _pd_phase.block_id 1 PH1 2 PH2 3 PH3 loop _pd_meas.2theta_scan _pd_meas.counts_total _pd_calc.intensity_total 5.00 1024 1024.321 5.10 1065 1025.542

...

----------------------------

data_DP2 _pd_block.idhttp://pd_block.id DP2 loop_ _pd_phase.idhttp://pd_phase.id _pd_phase.block_id 1 PH1 2 PH2 3 PH3 loop _pd_meas.2theta_scan _pd_meas.counts_total _pd_calc.intensity_total 5.00 1024 1024.321 5.10 1065 1025.542

...

----------------------------

data_PH1 _pd_block.idhttp://pd_block.id PH1 loop_ #phase knows about its diffraction patterns _pd_block_diffractogram.idhttp://pd_block_diffractogram.id DP1 DP2 _cell.length_a 12.34

..

----------------------------

data_PH2 _pd_block.idhttp://pd_block.id PH2 loop_ _pd_block_diffractogram.idhttp://pd_block_diffractogram.id DP1 DP2 _cell.length_a 4.56

..

----------------------------

data_PH3 _pd_block.idhttp://pd_block.id PH3 loop_ _pd_block_diffractogram.idhttp://pd_block_diffractogram.id DP1 DP2 _cell.length_a 7.89

..

----------------------------

data_DP1_PH1 #how does the phase and diffraction pattern know this exists? _pd_block.idhttp://pd_block.id DP1_PH1 _pd_block_diffractogram.idhttp://pd_block_diffractogram.id DP1 _pd_phase.block_id PH1 _pd_po_md_r 0.789 _pd_po_md_h 1 _pd_po_md_k 0 _pd_po_md_l 0 _pd_po_geom capillary

----------------------------

data_DP2_PH1 _pd_block.idhttp://pd_block.id DP2_PH1 _pd_block_diffractogram.idhttp://pd_block_diffractogram.id DP2 _pd_phase.blockid PH1 loop _pd_po_md_r _pd_po_md_h _pd_po_md_k _pd_po_md_l _pd_po_md_fract 0.654 1 0 0 0.88 0.976 0 1 0 0.12

----------------------------

data_DP1_PH2 _pd_block.idhttp://pd_block.id DP1_PH2 _pd_block_diffractogram.idhttp://pd_block_diffractogram.id DP1 _pd_phase.block_id PH2 _pd_po_md_r 0.456 _pd_po_md_h 1 _pd_po_md_k 1 _pd_po_md_l 0 _pd_po_geom capillary

----------------------------

data_DP2_PH2 _pd_block.idhttp://pd_block.id DP2_PH2 _pd_block_diffractogram.idhttp://pd_block_diffractogram.id DP2 _pd_phase.blockid PH2 loop _pd_po_sh_y_i _pd_po_sh_y_j _pd_po_sh_y_p _pd_po_sh_cijp 0 0 0 1 4 1 -1 0.789 4 1 1 -0.72 6 1 -1 0.124 6 1 +1 -0.64 6 2 -1 -0.451 6 2 +1 -0.641 8 1 -1 0.1545 8 1 +1 0.451

Looping PO information by histogram, within each phase

data_DP1 _pd_block.idhttp://pd_block.id DP1 loop_ _pd_phase.idhttp://pd_phase.id _pd_phase.block_id 1 PH1 2 PH2 3 PH3 loop _pd_meas.2theta_scan _pd_meas.counts_total _pd_calc.intensity_total 5.00 1024 1024.321 5.10 1065 1025.542

...

----------------------------

...

----------------------------

data_PH1 _pd_block.idhttp://pd_block.id PH1 loop_ _pd_block_diffractogram.short_id _pd_block_diffractogram.idhttp://pd_block_diffractogram.id 1 DP1 2 DP2 loop_ _pd_po_md_r _pd_po_md_h _pd_po_md_k _pd_po_md_l _pd_po_md_fract _pd_po_geom _pd_po_md_diffractogram_id 0.654 1 0 0 0.88 . 2 0.976 0 1 0 0.12 . 2 0.789 1 0 0 1 capillary 1

_cell.length_a 12.34

..

----------------------------

data_PH2 _pd_block.idhttp://pd_block.id PH2 loop_ _pd_block_diffractogram.short_id _pd_block_diffractogram.idhttp://pd_block_diffractogram.id 1 DP1 2 DP2

loop_ _pd_po_sh_y_i _pd_po_sh_y_j _pd_po_sh_y_p _pd_po_sh_cijp _pd_po_sh_geom #need to have a different geom tag

for MD and SH, as they would exist

            # in different loops, and may have
            # different values.

_pd_po_md_diffractogram_id 0 0 0 1 symmetric_reflection 2 4 1 -1 0.789 symmetric_reflection 2 4 1 1 -0.72 symmetric_reflection 2 6 1 -1 0.124 symmetric_reflection 2 6 1 +1 -0.64 symmetric_reflection 2 6 2 -1 -0.451 symmetric_reflection 2 6 2 +1 -0.641 symmetric_reflection 2 8 1 -1 0.1545 symmetric_reflection 2 8 1 +1 0.451 symmetric_reflection 2

loop_ _pd_po_md_r _pd_po_md_h _pd_po_md_k _pd_po_md_l _pd_po_md_geom _pd_po_md_diffractogram_id 0.456 1 1 0 capillary 1

_cell.length_a 4.56

..

----------------------------

data_PH3 _pd_block.idhttp://pd_block.id PH3 loop_ _pd_block_diffractogram.idhttp://pd_block_diffractogram.id DP1 DP2 _cell.length_a 7.89

..

Looping PO information by phase, within each diffractogram

data_DP1 _pd_block.idhttp://pd_block.id DP1 loop_ _pd_phase.idhttp://pd_phase.id _pd_phase.block_id 1 PH1 2 PH2 3 PH3

loop_ _pd_po_md_r _pd_po_md_h _pd_po_md_k _pd_po_md_l _pd_po_md_geom _pd_po_md_phase_id 0.789 1 0 0 capillary 1 0.456 1 1 0 capillary 2

loop _pd_meas.2theta_scan _pd_meas.counts_total _pd_calc.intensity_total 5.00 1024 1024.321 5.10 1065 1025.542

...

----------------------------

data_DP2 _pd_block.idhttp://pd_block.id DP2 loop_ _pd_phase.idhttp://pd_phase.id _pd_phase.block_id 1 PH1 2 PH2 3 PH3

loop_ _pd_po_md_r _pd_po_md_h _pd_po_md_k _pd_po_md_l _pd_po_md_fract _pd_po_md_phase_id 0.654 1 0 0 0.88 1 0.976 0 1 0 0.12 1

loop_ _pd_po_sh_y_i _pd_po_sh_y_j _pd_po_sh_y_p _pd_po_sh_cijp _pd_po_sh_phase_id 0 0 0 1 2 4 1 -1 0.789 2 4 1 1 -0.72 2 6 1 -1 0.124 2 6 1 +1 -0.64 2 6 2 -1 -0.451 2 6 2 +1 -0.641 2 8 1 -1 0.154 2 8 1 +1 0.451 2

loop_ _pd_meas.2theta_scan _pd_meas.counts_total _pd_calc.intensity_total 5.00 1024 1024.321 5.10 1065 1025.542

...

----------------------------

data_PH1 _pd_block.idhttp://pd_block.id PH1 loop_ _pd_block_diffractogram.idhttp://pd_block_diffractogram.id DP1 DP2 _cell.length_a 12.34

..

----------------------------

data_PH2 _pd_block.idhttp://pd_block.id PH2 loop_ _pd_block_diffractogram.idhttp://pd_block_diffractogram.id DP1 DP2 _cell.length_a 4.56

..

----------------------------

data_PH3 _pd_block.idhttp://pd_block.id PH3 loop_ _pd_block_diffractogram.idhttp://pd_block_diffractogram.id DP1 DP2 _cell.length_a 7.89

..

— Reply to this email directly, view it on GitHubhttps://github.com/COMCIFS/Powder_Dictionary/issues/9#issuecomment-1250975518, or unsubscribehttps://github.com/notifications/unsubscribe-auth/ACH7E2DMSOAN4L7KIQFOZRDV7BOLNANCNFSM6AAAAAAQL6XIPU. You are receiving this because you commented.Message ID: @.***>

jamesrhester commented 2 years ago

All phases are assumed to belong to the single specimen that is the subject of the dataset

This depends on how you define "single specimen". What do you call a specimen that goes through a phase change, or reacts? All phases belong to the "single specimen", but not all phases are present in all histograms.

Or for that matter, a temperature-dependent experiment will have a different specimen per histogram, as the lattice parameters have changed.

Hmm, indeed, we need to pin this down in general. I think we would agree that a phase is a superset of the single crystalline compounds described by core CIF as we are going to need to include amorphous phases.

Here's how I would build this up, with the way this is reflected in the dictionary provided:

A "phase" is a distinct compound existing in a powder specimen
- The PD_PHASE category exists, _pd_phase.id identifies a phase, and a mass percent is associated with it
A phase may be crystalline.
- Cell parameters, space group and atomic positions are associated with a particular phase by assigning a value to e.g. _cell.phase_id and _space_group.phase_id etc. (not yet present in PD dictionary but should be added).
A phase might only exist under certain range of environmental conditions.
- cell parameters, space group and atomic positions are separately associated with a particular set of environmental and measurement conditions by assigning a value to e.g. _cell.diffrn_id etc. Neither of these data names exist yet but they are implicit (cell parameters must depend on temperature so the dictionary can't contradict this!).

Under the above scheme, and using the "default" data block layout with the various block pointers ignored, the set of phases relating to a particular diffractogram would be determined by using _diffrn.id in the diffractogram block to find all blocks containing both _pd_phase.id and the same _diffrn.id - the _pd_phase.ids found are the list of phases. Or if you prefer to find which diffractograms include a particular phase, you would collect all values of _diffrn.id that occur in the same blocks as that _pd_phase.id and then find all diffractogram blocks that include that value of _diffrn.id. If you want to find the cell parameters for a particular phase at a particular temperature, you find the block that has the relevant values of _pd_phase.id and _diffrn.id.

A benefit and a drawback of the "default" scheme is that new phases can be added to a dataset by simply including the appropriate data blocks in the aggregate (imagine writing some extra CIF files to a directory). The drawback is that a given collection of data blocks has no way of stating that it is finished and complete. This could be addressed with data block pointers or "summary" data blocks that loop _pd_phase.id, but that is a separate task/problem that is orthogonal to making sure that the "default" scheme can be used to express relationships correctly.

Question: could PO depend also upon environmental conditions? (temperature and pressure at the moment). If so we need to also make the PO loop have a _diffrn.id child data name. Pressure seems like it would affect it?

rowlesmr commented 2 years ago

Yes, a phase may be crystalline, but we now need a way to specify a data block for an amorphous phase. Unless it's just implicitly done with an internal/ext std quantification that adds up to less than 100%. That's another issue entirely!

Why do you need _cell.phase_id and _space_group.phase_id? Can't we already do this? _pd_block_diffractogram.id in a phase data block points to the diffraction pattern, so you can get the temperature, time (and other parameters) of data collection to inform how (for example) the cell params change. _pd_phase.block_id in a histogram data block points to a phase, so you can get cell prms, space group, and inform how (for example) the peaks shift position. You can keep adding new data to the end of the file (or directory, or container-of-choice), as each histogram refers to the phases that contribute to it, and each phase refers to the histograms in which they occur*.

PO (and texture, microstructure...) can most definitely depend upon experimental conditions. Isn't this already be taken care of using either approach 2 or 3, as the PO lives in blocks that link to the phase/diffractogram, and therefore is associated with the other data items that appear in thos blocks?

data_phase1
_pd_block.id phase_1
_pd_block_diffractogram.id difpat_1
#...

data_difpat1
_pd_block.id difpat_1
_pd_phase.block_id phase_1
#...

data_phase2
_pd_block.id phase_2
_pd_block_diffractogram.id difpat_2
#...

data_difpat2
_pd_block.id difpat_2
_pd_phase.block_id phase_2
#...

#...

* In this scheme, you can even have unrelated experiments in there, as they would never refer to each other, although this would probably be a bad idea.

rowlesmr commented 2 years ago

Option 1 with (N+1) x (M+1) blocks sounds really messy.

I agree with this. I think the other options neatly aggregate the data with other information which is already being presented.

10: C(10,1), C(4,1), C(6,1), C(8,1)

This is a reason to go with the y_i, y_j, y_p notation, as it allows any orders to be expressed. I didn't know GSAS goes up to (at least) 10.

but if one considers a loop as just a list of entries then options 2 & 3 are pretty much interchangeable. As long as the standard picks one.

Currently, there exists _pd_phase.id to link with _pd_refln.phase_id. _pd_phase.id is looped with _pd_phase.block_id to point to the _pd_block.id holding the phase information. This is why, in my TOPAS CIF macros, the reflection markers live in the histogram data block, as I can link the exact reflections that exist in a histogram to a phase.

To allow option 2 to work, _pd_phase.id would also need to link with the new _pd_po_md.phase_id and _pd_po_sh.phase_id. This would allow looping of the PO information in a histogram data block pointing to the phase data blocks.

To allow option 3 to work, we'd need a new _pd_diffractogram.id to link with new _pd_po_md.diffractogram_id and _pd_po_sh.diffractogram_id. _pd_diffractogram.id would be looped with _pd_block.diffractogram_id to point to the correct _pd_block.id, where the histogram information would be.

I believe that option 2 is better, as the presence of PO is (usually) a specimen preparation related phenomenon, and would change between different specimens prepared from the the exact same powder. OTOH, you can also model actual texture using PO corrections, so in my thinking, there's a semantic difference between PO reported in a histogram to PO reported in a phase. I don't think we want to actually codify that difference.

Looking at other potential additions, like texture or microstructure, I think these belong in the phase data block. These are intrinsic phase properties, and as such, don't need specific linking back to a histogram, as they should be the same in any histogram that refers to that phase.

briantoby commented 2 years ago

This is a reason to go with the y_i, y_j, y_p notation, as it allows any orders to be expressed. I didn't know GSAS goes up to 10.

One can go to harmonic order 34, generating 32 terms for cubic where the C(m,n) terms go to m=34 & n=3. Did not look for any other s.g.

GSAS-II actually separates this. One can model texture as a phase property, usually requiring multiple detector or sample settings (think pole figure measurement). In this case there will be one set of terms for each phase. Or it can be used it as a crystallographic correction term. In which case, (as discussed) there will be a set of terms for each phase and histogram.

Part of this I disagree with. Even if one used the same specimen (almost never done anyway), there is no reason to expect that a neutron measurement would see the same P.O. as an x-ray measurement, since the penetration is so different. A histogram link is needed. While in texture experiments, one takes care to make sure that P.O. terms are phase properties, in the general case they vary with both phase and dataset. Nonetheless, since they can be phase properties, I’d agree that they belong in a phase block.

Brian

jamesrhester commented 2 years ago

Yes, a phase may be crystalline, but we now need a way to specify a data block for an amorphous phase. Unless it's just implicitly done with an internal/ext std quantification that adds up to less than 100%. That's another issue entirely!

Yes, indeed, I have added a general issue where we can discuss amorphous phases if/when we are ready.

Why do you need _cell.phase_id and _space_group.phase_id? Can't we already do this? _pd_block_diffractogram.id in a phase data block points to the diffraction pattern, so you can get the temperature, time (and other parameters) of data collection to inform how (for example) the cell params change. _pd_phase.block_id in a histogram data block points to a phase, so you can get cell prms, space group, and inform how (for example) the peaks shift position. You can keep adding new data to the end of the file (or directory, or container-of-choice), as each histogram refers to the phases that contribute to it, and each phase refers to the histograms in which they occur*.

Note I am not at all arguing against a data name distribution like (2) or (3), I was just making sure and demonstrating that the fallback scheme works. Yes you can certainly use block pointers to find your way without _cell.phase_id etc.- _cell.phase_id et al are for the benefit of the overall CIF approach to multiple data blocks that doesn't know about block pointers. Any dictionary-aware CIF software (not just powder software) when provided with the PD CIF dictionary and a bunch of data blocks could correctly tabulate all relationships and indeed generate/validate the block pointers.

Regarding adding extra blocks, I think that you would need to edit some of the blocks already present in order to add the pointers to new data blocks. On the other hand, it is actually hard to think of a situation for powder CIF where you would want to just add a data block to the end of previous data blocks with no changes to the previous contents, as presumably the refinement results overall would change if you are adding a new histogram or new phase, so I think that point of mine about adding data blocks is largely irrelevant to powder CIF in particular - more relevant if you are talking about a collection of raw data blocks.

PO (and texture, microstructure...) can most definitely depend upon experimental conditions. Isn't this already be taken care of using either approach 2 or 3, as the PO lives in blocks that link to the phase/diffractogram, and therefore is associated with the other data items that appear in thos blocks?

Yes, absolutely, but just like _cell.phase_id, if the relationship to a particular set of diffraction conditions is expressed by adding a data name linked to _diffrn.id then that relationship is available to general CIF software. Otherwise it is encoded in a non-machine-readable text description of a data block pointer that must therefore be hard-coded in powder-CIF-specific software. Note that these child data names will not need to be provided in the data block unless you actually loop the cell parameters within the data block - this has been suggested in the past as a way of providing a "summary" data block.

jamesrhester commented 2 years ago

A histogram link is needed. While in texture experiments, one takes care to make sure that P.O. terms are phase properties, in the general case they vary with both phase and dataset.

I think we are ready then to construct the dictionary entries. The particular layout chosen for PO is orthogonal to the dictionary contents as far as I can see (no new data block pointers required) and is something that would form part of a separate recommendation/standard for PD software authors.

rowlesmr commented 2 years ago

The first decision is whether or not PO depends on diffractogram (to use pdcif terminology). From the above discussion, it does. Therefore a child data name of _pd_diffractogram.id must be defined for the PO loop. Note that _pd_diffractogram.id has not yet been defined as far as I can tell, and should always be assumed equal to _pd_block.id.

Isn't this _pd_block_diffractogram.id? What is missing is a data name which is to _pd_block_diffractogram.id, as _pd_phase.id is to _pd_phase.block_id.

rowlesmr commented 2 years ago

Should the names be a part of PD_PROC_LS? They are

parameters relevant to a least-squares fit to a powder diffractogram

We also need a data name which is to _pd_block_diffractogram.id, as _pd_phase.id is to _pd_phase.block_id. Maybe _pd_diffractogram.id? Maybe also add _pd_diffractogram.block_id as an alias for _pd_block_diffractogram.id?

Anyhoo, a listing of (probably) relevant data names:

March-Dollase

_pd_preferred_orient_March-Dollase.r & .r_su
- the March-Dollase r factor. Float, (0, infty) , default value = 1.0
_pd_preferred_orient_March-Dollase.h
- the h index of the orientation direction. Int, (-infty, -infty)
_pd_preferred_orient_March-Dollase.k
- the k index of the orientation direction. Int, (-infty, -infty)
_pd_preferred_orient_March-Dollase.l
- the k index of the orientation direction. Int, (-infty, -infty)
_pd_preferred_orient_March-Dollase.fract & .fract_su
- in the case of multiple PO directions, the fractional amount of PO in that direction. Float, (0, infty), default value = 1.0. The sum of all values for a single structure must be 1.
_pd_preferred_orient_March-Dollase.geom
- the geometry of the PO correction, as distinct from the geometry of data collection. See discussion in Rowles and Buckley 2017, §3. Possible values
  - "stran": symmetric transmission
  - "atran": asymmetric transmission
  - "srefl": symmetric reflection
  - "arefl": asymmetric reflection, and
  - "cap": capillary geometries.
  - If no value, assume "srefl".
_pd_preferred_orient_March-Dollase.phase_block_id
- A code which identifies the particular phase to which this PO correction belongs. Must have the same value as given by _pd_phase.id, which is linked by _pd_phase.block_id to the _pd_block.id
_pd_preferred_orient_March-Dollase.diffractogram_block_id
- A code which identifies the particular diffractogram to which this PO correction belongs. Must have the same value as given by _pd_diffractogram.id, which is linked by _pd_block_diffractogram.id to the _pd_block.id
_pd_preferred_orient_March-Dollase.id
- in the case of multiple directions, an ID number to uniquely id a PO direction.

Spherical harmonics

_pd_preferred_orient_sphericalharmonics.y_i
- order of the SH; even integer, [0, infty)
_pd_preferred_orient_sphericalharmonics.y_j
- terms of each order: integer, [0, y_i]. Allowed values limited by space group symmetry.
_pd_preferred_orient_sphericalharmonics.y_p
- term parity: integer, [-1, 1] (can only be 0 for the 0th term of each order, otherwise, +/- 1). Allowed values limited by space group symmetry.
_pd_preferred_orient_sphericalharmonics.c_ijp & .c_ijp_su
- the actual value of the SH correction for a given y_i, y_j, y_p combination. Float (-infty, infty)
_pd_preferred_orient_sphericalharmonics.geom
- the geometry of the PO correction, as distinct from the geometry of data collection. See discussion in Rowles and Buckley 2017, §3. Possible values
  - "stran": symmetric transmission
  - "atran": asymmetric transmission
  - "srefl": symmetric reflection
  - "arefl": asymmetric reflection, and
  - "cap": capillary geometries.
  - If no value, assume "srefl".
_pd_preferred_orient_sphericalharmonics.phase_block_id
- A code which identifies the particular phase to which this PO correction belongs. Must have the same value as given by _pd_phase.id, which is linked by _pd_phase.block_id to the _pd_block.id
_pd_preferred_orient_sphericalharmonics.diffractogram_block_id
- A code which identifies the particular diffractogram to which this PO correction belongs. Must have the same value as given by _pd_diffractogram.id, which is linked by _pd_block_diffractogram.id to the _pd_block.id
_pd_preferred_orient_sphericalharmonics.id
- An ID number to uniquely id a PO direction.
_pd_preferred_orient_sphericalharmonics.texture_index & .texture_index_su
- The texture index is defined by Bunge (eqn 4.212) as $\sum{i,j,p} \frac{1}{2i+1} c{i,j,p}^2$. It gives the value [1, infty), where 1 is a random powder and infty is an ideal single crystal.

jamesrhester commented 2 years ago

Isn't this _pd_block_diffractogram.id? What is missing is a data name which is to _pd_block_diffractogram.id, as _pd_phase.id is to _pd_phase.block_id.

That is exactly what I mean by _pd_diffractogram.id: same relationship as _pd_phase.id to _pd_phase.block_id. One refers to a data block, the other identifies some concept, if you force them equal to one another you are saying that only one of them is allowed in a data block.

jamesrhester commented 2 years ago

PD_PROC_LS looks like it was a place to put parameters that don't fit into the single crystal model. We would have to deprecate _pd_proc_ls.pref_orient_corr but that's not an issue and just an opportunity to refer to the new data names. Very important question: do we want to abbreviate preferred_orientation to pref_orient in the data names? Programmers everywhere will thank us. The data names will still be findable if the phrase "preferred orientation" is used in the text description.

The data names in any case look excellent. My only comment is that I still want to include child data names of _pd_diffractogram.id and _pd_phase.id. The block-based links you have included are fine, just as I've explained above they work in a different way. The child data names I am proposing would rarely appear in data files and I'm happy to prepare them in a separate commit. So what we need to do:

[ ] Write text descriptions for each of the categories (March-Dollase, SH), explaining how data names from the category are used. This would be the place for the overall formula or citation.
[ ] Create an example for each of the categories (demonstration of phase, diffractogram pointers not necessary I think)
[ ] Write concise text descriptions for each of the data names, citing literature to save space (half done above already)
[ ] Add the DDLm technical attributes to each definition <- I'm more than happy to do this once the above is done
[ ] Adjust definition for _pd_proc_ls.pref_orient_corr

@rowlesmr are you able to do the first 3 items on the above list?

briantoby commented 2 years ago

I’d say yes

On Sep 28, 2022, at 8:38 AM, Matthew Rowles @.**@.>> wrote:

Should the names be a part of PD_PROC_LS?

rowlesmr commented 2 years ago

I’d say yes

There's a PR I'm working on #29

We created a new category PD_PREF_ORIENT, as PD_PROC_LS is a Set category.