cctbx / cctbx_project

Computational Crystallography Toolbox
https://cci.lbl.gov/docs/cctbx
Other
218 stars 116 forks source link

Add detector and unit cell drift plotting tool to xfel utils #835

Closed Baharis closed 1 year ago

Baharis commented 1 year ago

This PR suggests adding detector-and-unit-cell-drift plotting capabilities to the main branch of cctbx. Similarly to the weather plot, this functionality is available only via libtbx.python `libtbx.find_in_repositories xfel`/util/drifter.py input.glob=batch*TDER/ command. It requires merging jobs input files as the input. At this moment, the directory structure needs to follow the one utilized by cctbx.xfel GUI. The algorithm is yet to be tested against files created outside GUI (eg. by manual striping), but in the current state it already might be a valuable tool for anyone running time-dependent ensemble refinements.

The drifter plot presents visually and prints in the output the following information as a function of run number:

Screen Shot 2022-12-22 at 5 23 35 PM

As always, please let me know if this functionality would be better suited for a separate branch or it's addition should be postponed until it is 100% certain that it also works against directory structure which does not follow GUI.

Baharis commented 1 year ago

Temporarily changing PR to draft in order to enhance, document, and re-test the correlation heatmap.

Baharis commented 1 year ago

Moved correlation calculations to a separate CorrelationMatrix object, added explicit numerical correlation values in the output, for example:

Correl.       x       y       z       a       b       c
x       +1.0000 +0.6413 +0.1583 -0.6452 -0.5188 -0.5568
y       +0.6413 +1.0000 -0.6561 +0.1726 -0.9887 -0.9944
z       +0.1583 -0.6561 +1.0000 -0.8565 +0.7620 +0.7321
a       -0.6452 +0.1726 -0.8565 +1.0000 -0.3185 -0.2755
b       -0.5188 -0.9887 +0.7620 -0.3185 +1.0000 +0.9990
c       -0.5568 -0.9944 +0.7321 -0.2755 +0.9990 +1.0000

Also, a comment regarding concerns that the correlation values might be calculated wrong or do not correspond well to the drift curves is due. Current implementation calculates correlation weighted by number of reflections. This prevents small, random batches from dominating the results. I believe this solution is better than weighting by number of experiments, which in turn could be skewed by abundance of inaccurate low-resolution experiments. Standard deviations of individual series points (x, y, z, a, b, c) are not considered in weighting.