cctbx / cctbx_project

Computational Crystallography Toolbox
https://cci.lbl.gov/docs/cctbx
Other
207 stars 111 forks source link

Evaluation of preferential orientation #967

Closed Baharis closed 4 months ago

Baharis commented 4 months ago

This PR introduces a method to analyze the kind and degree of preferential orientation across multiple experiments. The method can be accessed in two different ways. Firstly, it is accessible as a standalone tool via cctbx.xfel.preference, providing a quick method to analyze small amounts of data. Additionally, preferential orientation can be analyzed via a dedicated cctbx.xfel.merge worker called preference, which takes no phil parameters and can be easily introduced before merging. Both methods can be run using MPI.

The degree of preferential orientation, i.e. the "preference", is evaluated by fitting a Watson Distribution to the list of normalized lattice directions across all experiments. The Watson distribution is a conceptually simple generalization of the Gaussian distribution for a sphere (though I really struggled to find good resources about it). The concentration of unit vectors is described using two parameters – vector μ and scalar κ:

Screen Shot 2024-02-08 at 5 30 43 PM Figure: Red vectors following Watson distribution around black μ=[0,0,1] with κ = -10, ~0, and 10, respectively.

With that in mind, the general algorithm for evaluating preference is as follows:

The most important product is the listing of κ and μ values for zone axes, in particular, the value of κ in the first row.

----------------------------------------------------
Direction  kappa   mu                      NLL
----------------------------------------------------
{0,0,1}    -1.208  [-0.708,+0.706,-0.027]  -7.41E+03
{1,0,0}    +0.593  [+0.711,-0.703,+0.011]  -2.29E+03
{2,-1,0}   +0.581  [+0.710,-0.704,+0.012]  -2.21E+03
{0,1,0}    -0.623  [-0.709,-0.701,+0.083]  -2.17E+03
{0,1,-2}   -0.609  [-0.709,+0.705,-0.020]  -2.08E+03
{1,-1,0}   +0.555  [+0.709,-0.705,+0.017]  -2.00E+03
{2,0,-1}   +0.539  [-0.711,+0.703,-0.010]  -1.89E+03
{2,-1,-1}  +0.531  [+0.711,-0.704,+0.012]  -1.83E+03
{2,-2,-1}  +0.513  [+0.710,-0.704,+0.016]  -1.71E+03
{1,-2,0}   +0.500  [+0.708,-0.706,+0.029]  -1.62E+03
{0,2,-1}   -0.429  [-0.710,-0.699,+0.079]  -1.06E+03
{1,-2,-1}  +0.407  [+0.708,-0.706,+0.029]  -1.06E+03
{1,-1,-1}  +0.401  [+0.710,-0.704,+0.014]  -1.03E+03
{2,-1,-2}  +0.398  [-0.711,+0.703,-0.009]  -1.02E+03
{1,0,-1}   +0.397  [+0.712,-0.702,+0.007]  -1.01E+03
{1,-2,-2}  +0.181  [+0.707,-0.706,+0.032]  -2.05E+02
{0,1,-1}   +0.144  [+0.050,+0.026,+0.998]  -1.29E+02
{1,0,-2}   -0.134  [+0.072,+0.018,+0.997]  -1.08E+02
{1,-1,-2}  -0.090  [+0.037,+0.027,+0.999]  -4.95E+01
----------------------------------------------------

This is how the table might look for Laue class mmm. Here, κ < -1 suggests a strong preference for zone axes {001} = (001) and (00-1) to point away from direction [-1,1,0] in the laboratory space due to the use of tilted Kapton tape. Other zone axes also feature κ < -0.1 or κ > +0.1 strongly suggesting, that preferential orientation might be present. For decently large datasets of ~> 10000 experiments, uniform distributions should yield κ below 0.1.

The distribution of individual zone axes is additionally plotted; in standalone, the default plot.style=hammer, which shows one heatmap of vector distribution on a unit sphere for each zone axis:

Screen Shot 2024-02-08 at 6 06 42 PM

Since the worker will usually run in a headless environment, a single small 2d heatmap of vector density across azimuth/polar coordinates will be printed in the main log using ASCII characters instead:

Ascii distribution heat plot for direction {0,0,1}:
┌────────────────────Z───────────────────┐
│▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▓▓▓▓▒▒▒▒▒▒▒▒▒▒▒▒│
│▒▒▒▒▒▒▒▒▒▒▒▒░░░░░░░░▓▓▓▓▓▓▓▓▒▒▒▒░░░░▒▒▒▒│
│▒▒▒▒▓▓▓▓░░░░░░░░░░░░▓▓▓▓▓▓▓▓▒▒▒▒░░░░░░░░│
│▒▒▒▒▓▓▓▓░░░░░░░░░░░░▒▒▒▒▓▓▓▓░░░░    ░░░░│
│▒▒▒▒████░░░░        ▒▒▒▒████░░░░        │
X▒▒▒▒████░Y░░        X▒▒▒████░░Y░        X
│▒▒▒▒▓▓▓▓░░░░    ░░░░▒▒▒▒▓▓▓▓░░░░░░░░░░░░│
│▓▓▓▓▓▓▓▓▒▒▒▒░░░░░░░░▒▒▒▒▓▓▓▓░░░░░░░░░░░░│
│▓▓▓▓▓▓▓▓▒▒▒▒░░░░▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒░░░░░░░░│
│▒▒▒▒▓▓▓▓▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒│
└────────────────────Z───────────────────┘

Screen Shot 2024-02-08 at 6 09 04 PM

These maps can be also generated for all zone axes in the standalone using plot.style=ascii. Since they only use ASCII characters, there shouldn't be any problems writing them or storing code in cctbx.

To sum up, this is not the best way to analyze preferential orientation, but in my opinion, it is simple and intuitive. The better way would be to analyze the distribution of entire [a,b,c] matrices instead of individual vectors, but I don't know how to do it. Also, the numeric value of κ will be 0 in cubic systems, even if there is some preferential orientation, due to its inherent symmetry. In this case, one would need to look at the plots to find it. This isn't a big issue since preferential orientation should not cause any significant problems for cubic crystals.

Baharis commented 4 months ago

Some further comments:

Baharis commented 4 months ago

@nksauter endorsed the PR during a meeting; Waiting for any suggestions from @phyy-nx, whenever you'll be available.

Baharis commented 4 months ago

@phyy-nx I assume that by "exceptions" you mean specifically ModuleNotFoundError; if that's the case, I applied the "dials" protection as suggested before by @ndevenish: imports are inside try/except block, exception caught and later re-raised only if matplotlib-less installation is attempting to use matplotlib (diff). Is that enough?

As for the plots themselves, the default for standalone is Hammer plot (the one shown above) and the only option for merge worker is ASCII (to make it as light and simple as possible). I assumed that should anyone be interested enough to have nice figures, they can do it after merging using cctbx.xfel.preference to have a higher degree of control.

ndevenish commented 4 months ago

Just a fair warning that Maplotlib is often an extra special case and about the only time I find in-scope importing justifiable; unless it changed recently, importing some parts of matplotlib - and pyplot was definitely in this set - chooses and “activates” the backend, initialising the connection to the window server - making it slightly more complicated to change, “bouncing” the dock on Macos, and potentially throwing (or at least printing) errors on remote systems that don’t have a windowing session attached. I don’t remember working out a pattern for this that I was completely happy with, I think in DIALS we just do in-scope imports (in library code, directly end-user run scripts are different although we also have to be careful about backend selection if we have an option to write an image instead of display).

This might be done here in a place that it doesn’t matter, I didn’t follow it through, but can imagine this is part of what @phyy-nx is referring to.

phyy-nx commented 4 months ago

Yup, sure is! If you run git log | grep matplotlib you'll matplotlib being moved see this a bunch of times. A good example of what I meant by using if statements is 1f0b05938fc5695f7ee2b4a7908ab49ddc33c232

Programs like dials.stills_process and xfel.merge that are designed to run on a compute node (as opposed to a laptop or login node) should use if statements and inline imports for matplotlib. Matplotlib fails differently in different environments, and even if the module can be imported (therefore not raise ModuleNotFound), the import can bork the environment.

Baharis commented 4 months ago

OK, I think I found a solution that should work everywhere and that I don't hate completely 🙂 already in use e.g. here. Edit: works; I think that should solve it.