Closed Baharis closed 4 months ago
Some further comments:
run
method in a try/except
decorator so that merging doesn't fail even if this particular worker does.cctbx.xfel.preference input.glob=/path/to/files*.expt
.preference
to dispatch in any cctbx.xfel.merge
phil.@nksauter endorsed the PR during a meeting; Waiting for any suggestions from @phyy-nx, whenever you'll be available.
@phyy-nx I assume that by "exceptions" you mean specifically ModuleNotFoundError
; if that's the case, I applied the "dials" protection as suggested before by @ndevenish: imports are inside try/except block, exception caught and later re-raised only if matplotlib-less installation is attempting to use matplotlib (diff). Is that enough?
As for the plots themselves, the default for standalone is Hammer plot (the one shown above) and the only option for merge worker is ASCII (to make it as light and simple as possible). I assumed that should anyone be interested enough to have nice figures, they can do it after merging using cctbx.xfel.preference to have a higher degree of control.
Just a fair warning that Maplotlib is often an extra special case and about the only time I find in-scope importing justifiable; unless it changed recently, importing some parts of matplotlib - and pyplot was definitely in this set - chooses and “activates” the backend, initialising the connection to the window server - making it slightly more complicated to change, “bouncing” the dock on Macos, and potentially throwing (or at least printing) errors on remote systems that don’t have a windowing session attached. I don’t remember working out a pattern for this that I was completely happy with, I think in DIALS we just do in-scope imports (in library code, directly end-user run scripts are different although we also have to be careful about backend selection if we have an option to write an image instead of display).
This might be done here in a place that it doesn’t matter, I didn’t follow it through, but can imagine this is part of what @phyy-nx is referring to.
Yup, sure is! If you run git log | grep matplotlib you'll matplotlib being moved see this a bunch of times. A good example of what I meant by using if statements is 1f0b05938fc5695f7ee2b4a7908ab49ddc33c232
Programs like dials.stills_process and xfel.merge that are designed to run on a compute node (as opposed to a laptop or login node) should use if statements and inline imports for matplotlib. Matplotlib fails differently in different environments, and even if the module can be imported (therefore not raise ModuleNotFound), the import can bork the environment.
This PR introduces a method to analyze the kind and degree of preferential orientation across multiple experiments. The method can be accessed in two different ways. Firstly, it is accessible as a standalone tool via
cctbx.xfel.preference
, providing a quick method to analyze small amounts of data. Additionally, preferential orientation can be analyzed via a dedicatedcctbx.xfel.merge
worker calledpreference
, which takes no phil parameters and can be easily introduced before merging. Both methods can be run using MPI.The degree of preferential orientation, i.e. the "preference", is evaluated by fitting a Watson Distribution to the list of normalized lattice directions across all experiments. The Watson distribution is a conceptually simple generalization of the Gaussian distribution for a sphere (though I really struggled to find good resources about it). The concentration of unit vectors is described using two parameters – vector μ and scalar κ:
With that in mind, the general algorithm for evaluating preference is as follows:
input.glob
or the previous worker;input.space_group
(standalone only)...input.symmetrize
);The most important product is the listing of κ and μ values for zone axes, in particular, the value of κ in the first row.
This is how the table might look for Laue class mmm. Here, κ < -1 suggests a strong preference for zone axes {001} = (001) and (00-1) to point away from direction [-1,1,0] in the laboratory space due to the use of tilted Kapton tape. Other zone axes also feature κ < -0.1 or κ > +0.1 strongly suggesting, that preferential orientation might be present. For decently large datasets of ~> 10000 experiments, uniform distributions should yield κ below 0.1.
The distribution of individual zone axes is additionally plotted; in standalone, the default
plot.style=hammer
, which shows one heatmap of vector distribution on a unit sphere for each zone axis:Since the worker will usually run in a headless environment, a single small 2d heatmap of vector density across azimuth/polar coordinates will be printed in the main log using ASCII characters instead:
These maps can be also generated for all zone axes in the standalone using
plot.style=ascii
. Since they only use ASCII characters, there shouldn't be any problems writing them or storing code in cctbx.To sum up, this is not the best way to analyze preferential orientation, but in my opinion, it is simple and intuitive. The better way would be to analyze the distribution of entire [a,b,c] matrices instead of individual vectors, but I don't know how to do it. Also, the numeric value of κ will be 0 in cubic systems, even if there is some preferential orientation, due to its inherent symmetry. In this case, one would need to look at the plots to find it. This isn't a big issue since preferential orientation should not cause any significant problems for cubic crystals.