Add detector and unit cell drift plotting tool to xfel utils

Baharis commented 1 year ago

This PR suggests adding detector-and-unit-cell-drift plotting capabilities to the main branch of cctbx. Similarly to the weather plot, this functionality is available only via libtbx.python `libtbx.find_in_repositories xfel`/util/drifter.py input.glob=batch*TDER/ command. It requires merging jobs input files as the input. At this moment, the directory structure needs to follow the one utilized by cctbx.xfel GUI. The algorithm is yet to be tested against files created outside GUI (eg. by manual striping), but in the current state it already might be a valuable tool for anyone running time-dependent ensemble refinements.

The drifter plot presents visually and prints in the output the following information as a function of run number:

Detector origin vector (X, Y, Z) and its uncertainty determined from spot uncertainty;
Unit cell lengths (a, b, c) and standard deviation of their distribution;
Number of reflections, experiments, and reflection-to-experiments ratios;
Weighted correlation between X, Y, Z, a, b, c expressed using triangular correlation heat-map.

Screen Shot 2022-12-22 at 5 23 35 PM

As always, please let me know if this functionality would be better suited for a separate branch or it's addition should be postponed until it is 100% certain that it also works against directory structure which does not follow GUI.

Baharis commented 1 year ago

Temporarily changing PR to draft in order to enhance, document, and re-test the correlation heatmap.

Baharis commented 1 year ago

Moved correlation calculations to a separate CorrelationMatrix object, added explicit numerical correlation values in the output, for example:

Correl.       x       y       z       a       b       c
x       +1.0000 +0.6413 +0.1583 -0.6452 -0.5188 -0.5568
y       +0.6413 +1.0000 -0.6561 +0.1726 -0.9887 -0.9944
z       +0.1583 -0.6561 +1.0000 -0.8565 +0.7620 +0.7321
a       -0.6452 +0.1726 -0.8565 +1.0000 -0.3185 -0.2755
b       -0.5188 -0.9887 +0.7620 -0.3185 +1.0000 +0.9990
c       -0.5568 -0.9944 +0.7321 -0.2755 +0.9990 +1.0000

Also, a comment regarding concerns that the correlation values might be calculated wrong or do not correspond well to the drift curves is due. Current implementation calculates correlation weighted by number of reflections. This prevents small, random batches from dominating the results. I believe this solution is better than weighting by number of experiments, which in turn could be skewed by abundance of inaccurate low-resolution experiments. Standard deviations of individual series points (x, y, z, a, b, c) are not considered in weighting.

cctbx / cctbx_project

Add detector and unit cell drift plotting tool to xfel utils #835