COMCIFS / Powder_Dictionary

CIF definitions for powder diffraction
4 stars 4 forks source link

Add new preferred orientation data names #29

Closed rowlesmr closed 1 year ago

rowlesmr commented 1 year ago

This is very much an alpha version.

rowlesmr commented 1 year ago

Just noticed that PD_PROC_LS is a Set. Doesn't that mean you can't loop it's data names? Or am I thinking about this wrong? Should these be a part of PD_PHASE?

jamesrhester commented 1 year ago

Yes, PD_PROC_LS is a Set category so should not be looped. I think it is entirely OK to create a new category to hold the preferred orientation information, e.g. PD_PREF_ORIENT for these data names. Otherwise, the text descriptions look excellent, so once you've corrected the names I can add in the technical bits.

rowlesmr commented 1 year ago

The errors are with the syntax are

Error code 102 at line 4501, column 21, near "": invalid encoded character Error code 102 at line 4865, column 39, near "": invalid encoded character

I can't see what the issue is. I'm assuming this is "Error code 102".

rowlesmr commented 1 year ago

OK. I got nothing.

I thought it might have something to do with the a/umlaut in Jarvinen's name, but the error persists...

jamesrhester commented 1 year ago

This is because there is at least one character in there that is not ASCII and not UTF-8 encoded (required by CIF2 standard). Github warns that it is latin-1 encoded which means github knows there are non-ASCII characters...we just have to find them. Not sure why the syntax checker gets the line location wrong though. Working on it...

jamesrhester commented 1 year ago

So the bad characters were two hyphens in the page range of the citation to the 2017 JAC paper in the geom definitions for the preferred orientation corrections.

jamesrhester commented 1 year ago

Will add March-Dollase to the layout checker so that at least that error goes away. my pretty printer does a pretty good job of creating correct formatting if required.

rowlesmr commented 1 year ago

Posting for posterity, the errors from commit 057dd2a

4709: rule 3.1.3: '_pd_pref_orient.March-Dollase_diffractogram_block_id' should begin at 9, begins at 10 5062: rule 3.1.3: '_pd_pref_orient.sphericalharmonics_diffractogram_block_id' should begin at 9, begins at 7 5140: rule 3.1.3: '_pd_pref_orient.sphericalharmonics_phase_block_id' should begin at 9, begins at 10 5163: rule 3.1.3: '_pd_pref_orient.sphericalharmonics_texture_index' should begin at 9, begins at 10

The errors are correct, the line numbers are wrong.

Should be: 4683, 5036, 5114, 5137; all 26 lines off.

rowlesmr commented 1 year ago

Will add March-Dollase to the layout checker so that at least that error goes away. my pretty printer does a pretty good job of creating correct formatting if required.

I think the errors are back. I think you did fix them, and then I had to alter the _name.object_id of all the March-Dollase data names, as they need to be of type Name, and can only be ASCII alpha-numeric characters or underscore.

Also, isn't the relevant rule 2.1.11?

Capitalisation:

4706: rule 2.1.15: Save frame for March_Dollase_diffractogram_block_id does not have canonical case for category/object names pd_pref_orient/March_Dollase_diffractogram_block_id 4729: rule 2.1.15: Save frame for March_Dollase_fract does not have canonical case for category/object names pd_pref_orient/March_Dollase_fract 4752: rule 2.1.15: Save frame for March_Dollase_fract does not have canonical case for category/object names pd_pref_orient/March_Dollase_fract 4769: rule 2.1.15: Save frame for March_Dollase_geom does not have canonical case for category/object names pd_pref_orient/March_Dollase_geom 4807: rule 2.1.15: Save frame for March_Dollase_hkl does not have canonical case for category/object names pd_pref_orient/March_Dollase_hkl 4836: rule 2.1.15: Save frame for March_Dollase_id does not have canonical case for category/object names pd_pref_orient/March_Dollase_id 4855: rule 2.1.15: Save frame for March_Dollase_index_h does not have canonical case for category/object names pd_pref_orient/March_Dollase_index_h 4874: rule 2.1.15: Save frame for March_Dollase_index_k does not have canonical case for category/object names pd_pref_orient/March_Dollase_index_k 4893: rule 2.1.15: Save frame for March_Dollase_index_l does not have canonical case for category/object names pd_pref_orient/March_Dollase_index_l 4912: rule 2.1.15: Save frame for March_Dollase_phase_block_id does not have canonical case for category/object names pd_pref_orient/March_Dollase_phase_block_id 4934: rule 2.1.15: Save frame for March_Dollase_r does not have canonical case for category/object names pd_pref_orient/March_Dollase_r 4970: rule 2.1.15: Save frame for March_Dollase_r_su does not have canonical case for category/object names pd_pref_orient/March_Dollase_r_su

jamesrhester commented 1 year ago

OK, I added "March-Dollase" instead of "March_Dollase". Will fix that. And he rule number.

jamesrhester commented 1 year ago

OK, fixed the capitalisation check and rule numbering in the checking tool.

jamesrhester commented 1 year ago

The current crop of data names still have the issue that some are intended to be looped, and others are not, but they are all in the one category. For example ...special_details and ...block_id are presumably single-valued for a given phase and histogram, but would be looped with h,k,l, or more strictly March-Dollase_id and sphericalharmonics_id - and both of these are required in any loop according to the category description.

I suggest the following solution:

  1. category PD_PREF_ORIENT is used to provide overall information (block pointer, special details, texture index, geom)
  2. new category PD_PREF_ORIENT_MARCH-DOLLASE contains the M-D hkl loop and anything that varies with HKL
  3. new category PD_PREF_ORIENT_SPHERICALHARMONICS likewise for S-H.

Most of these changes involve just moving around a .. Note that I'm suggesting a block pointer that is common between the two variants of PO correction, and things like geom and texture_index that are specific to a particular type of correction but take a single value are also bundled together into the PD_PREF_ORIENT category.

rowlesmr commented 1 year ago
  1. category PD_PREF_ORIENT is used to provide overall information (block pointer, special details, texture index, geom)

I don't think texture index, geom, phase_id, or diffractogram_block_id can be Set categories.

@briantoby has argued that PO values belong in the data block belonging to the structure. As such, if the same structure is used with multiple histograms, you could have multiple texture indicies, geoms, and diffractogram_block_ids.

Additionally, If the PO is reported in the histogram datablock (as I've argued), then there could be multiple phases, and so multiple texture indicies, and phase_ids.

data_aPhase_ala_Brian
#...
loop_
_pd_po_sh_y_i
_pd_po_sh_y_j
_pd_po_sh_cijp
_pd_po_sh_texture_index 
_pd_po_sh_geom
_pd_po_sh_diffractogram_block_id
0  0   1   . .   dp1
4 -1   0.789 . . dp1
4  1  -0.72  . . dp1
0  0   1 . .     dp2
4 -1   0.889 . . dp2
4  1  -0.45 . .  dp2
. . . val1 srefl dp1
. . . val2 cap   dp2

#...

I think it could be good to separate MD from SH corrections. and maybe put special_details in _pd_pref_orient

rowlesmr commented 1 year ago

Wrong button!

rowlesmr commented 1 year ago

The layout check is failing:

5469: rule 2.1.11: Save frame for r does not have canonical case for category/object names pd_pref_orient_March_Dollase/r

Does it want it to be R? Because, no; convention says it's r.

jamesrhester commented 1 year ago

Yes it does want R, simply because of a data name in the core dictionary that ends <something>.R (for R factor). I have redone the checking code so it should pass now.

jamesrhester commented 1 year ago
  1. category PD_PREF_ORIENT is used to provide overall information (block pointer, special details, texture index, geom)

I don't think texture index, geom, phase_id, or diffractogram_block_id can be Set categories.

@briantoby has argued that PO values belong in the data block belonging to the structure. As such, if the same structure is used with multiple histograms, you could have multiple texture indicies, geoms, and diffractogram_block_ids.

Additionally, If the PO is reported in the histogram datablock (as I've argued), then there could be multiple phases, and so multiple texture indicies, and phase_ids.

Ah yes, OK, so there should be data names _pd_pref_orient.phase_id and _pd_pref_orient.diffractogram_id. Then in the Default schema of one phase and/or one diffractogram per data block the _pd_pref_orient data names will be single-valued, but if they are put into a data block containing the structural information then they will vary with diffractogram, and if they are put into a data block containing the diffractogram then they will vary with phase. Which of these to prefer can be decided later, as long as those two data names are present in the dictionary to indicate the actual dependence of the data names on both phase and diffractogram.

rowlesmr commented 1 year ago

So _pd_pref_orient.phase_id and _pd_pref_orient.diffractogram_id need to be loopable.

Thinking about it more, there should actually be _pd_pref_orient_March_Dollase.phase_id, _pd_pref_orient_March_Dollase.diffractogram_id, _pd_pref_orient_sphericalharmonics.phase_id and _pd_pref_orient_sphericalharmonics.diffractogram_id.

Consider the case of one phase in two diffractograms, one corrected with MD, the other with SH, and using Brian's schema of putting the data in the phase datablock:

data_aPhase_ala_Brian
#...
loop_
_pd_po_sh_y_i
_pd_po_sh_y_j
_pd_po_sh_cijp
_pd_po_sh_diffractogram_id
0  0   1     dp1
4 -1   0.789 dp1
4  1  -0.72  dp1

loop_
_pd_po_md_r
_pd_po_md_hkl
_pd_po_md_diffractogram_id
0.456 [1 1 0]  dp2
#...
jamesrhester commented 1 year ago

So _pd_pref_orient.phase_id and _pd_pref_orient.diffractogram_id need to be loopable.

The distinction between Set and Loop categories is that in the Default schema Set categories are not looped. So if for a single phase and diffractogram the quantities in pd_pref_orient are single-valued, then it is not a Loop category. Any category that has key data names defined is loopable, just that some are not looped in the Default schema.

Put another way, if all of the key data names of a category are from Set categories, then that category is also a Set category. As far as I can tell pd_pref_orient is therefore a Set category, that is, it is not looped in the Default schema.

Thinking about it more, there should actually be _pd_pref_orient_March_Dollase.phase_id, _pd_pref_orient_March_Dollase.diffractogram_id, _pd_pref_orient_sphericalharmonics.phase_id and _pd_pref_orient_sphericalharmonics.diffractogram_id.

Yes, there definitely should - this is the CIF dictionary way of stating that pd_pref_orient_MD/SH data names depend in general on both phase and diffractogram. I have tended to leave adding these data names until the rest of the category has been figured out, so now is the time! Also, we haven't yet accepted _pd_diffractogram.id as a data name (#39).

rowlesmr commented 1 year ago

What have I done!? @jamesrhester @vaitkus can you help? I thought I was merging things on my fork. Apparantly, I wasn't.

jamesrhester commented 1 year ago

I'm sure we can fix it...

rowlesmr commented 1 year ago

If by "fix" you mean "just merge it", then yes! :)

Thanks.