cctbx / dxtbx

Diffraction Experiment Toolbox
BSD 3-Clause "New" or "Revised" License
2 stars 18 forks source link

Support for the Oxford Diffraction file format #11

Closed dagewa closed 1 year ago

dagewa commented 6 years ago

Format descriptions and source code received, with thanks to Dr. Mathias Meyer at Rigaku Oxford Diffraction.

Supporting information from Andreas Förster:

CrysAlisPro is considered by many the gold standard for small-molecule processing. Comparing DIALS against data collected with CAP (the software also controls ROD diffractometers) should help you improve the algorithms for small-molecule data. The compression algorithm that CAP uses is based on CCP4 bitwise, by the way.

dagewa commented 5 years ago

Here are example images provided by a user. Images collected by CrystalClear work (they are Rigaku SMV format), while images collected by CrysAlis(Pro) are in the unrecognised Rigaku Oxford Diffraction format. Rigaku_Images.zip

graeme-winter commented 4 years ago

I have been asked by Rigaku folks to take a look at adding support for

https://www.rigaku.com/en/arc

which shares the same file format, so I will take a look.

dagewa commented 4 years ago

Great! The code should read the headers right. I got as far as the various compression schemes and then ran out of steam

dagewa commented 2 years ago

Here is another example frame, courtesy of John Basca. This is from a HyPix camera as part of the XtaLAB Synergy-S system. HPX_B521-DNA-A1_1_1.rodhypix.zip

biochem-fan commented 1 year ago

Just to let you know I am working on this (very slowly).

  1. Decompression of the TY6 compression mode was implemented.
  2. I figured out how to calculate the detector vectors from the theta axis and the detector_rotns header.
  3. I do not know how to get the goniometer and the beam axes (especially the interpretation of the beam_rotn_around_e2 and beam_rotn_around_e3 headers).
  4. I also don't know how to handle the kappa and the phi axes.

3 is OK in practice ([0, 1, 0] and [0, 0, 1] are good enough) but I would like to figure out how the values in XDS.INP generated by CrysAlisPro are calculated.

4 is not problematic if you process each sweep independently.

graeme-winter commented 1 year ago

@biochem-fan - thank you

Do you have a public branch on the go? I could be able to help or find help with (3, 4) above

biochem-fan commented 1 year ago

Not yet. Most of my experiments are in my Jupyter Notebook at the moment.

When ready, I will push it to a new branch. I will also upload my test datasets (lysozyme and sodium glutamate monohydrate collected in house) to XRDa.

biochem-fan commented 1 year ago

I uploaded two test datasets from our in-house Rigaku HyPix 6000. Both consist of multiple sweeps with various kappa (= chi) and detector theta angles.

Not "best" datasets but ice rings are actually useful in validating the geometry ;)

biochem-fan commented 1 year ago

I uploaded my (incomplete) code to my Gist https://gist.github.com/biochem-fan/5f76423120e92828f03a4763a622a1e9. (@dagewa's commit above is orphan in the cctbx repository so I could not continue from it.)

Status:

I noticed @graeme-winter is working on the miniCBF format https://github.com/cctbx/dxtbx/pull/443. I hope we can get more insights into the interpretation of native header fields by comparing JH's miniCBF files and native rodhypix files.

biochem-fan commented 1 year ago

Updated my code on my Gist.

I managed to index both JH's tetragonal lysozyme and my monoclinic lysozyme (XRD-00093, run1).

The next step is handling of the multiple sweep.

biochem-fan commented 1 year ago

@graeme-winter

CrysAlisPro does not pad digits in file names (e.g. lysozyme_1_1.rodhypix instead of lysozyme_1_001.rodhypix). Is it possible to import all frames e.g. 1 to 360 without renaming them? dials.import template=lysozyme1_###.rodhypix imports only 100 to 360.

At the moment my workaround is:

find /path/to/dataset -name '*.rodhypix' | gawk 'match($0, /(.*)_([0-9]+).rodhypix/,m) {printf("ln -s %s `basename %s_%03d.rodhypix`\n", $0, m[1], m[2])}' > link.sh
bash link.sh
biochem-fan commented 1 year ago

Is import from STDIN broken?

files.lst:

lysozyme1_1_8.rodhypix
lysozyme1_1_9rodhypix
lysozyme1_1_10.rodhypix
lysozyme1_1_11.rodhypix

cat files.lst | dials.import shows the help message, which says find . -name "image_*.cbf" | dials.import is a valid usage.

dagewa commented 1 year ago

Thanks @biochem-fan for working on this. It seems your gist was overlooked. I'd be keen to get this added to dxtbx even with the current FIXMEs. Once we have some support for the format we can work on improving it as and when it comes up.

@graeme-winter did you have some data from Rigaku that they wanted you to look at?

biochem-fan commented 1 year ago

Remaining TODOs (in the order of priority):

biochem-fan commented 1 year ago

Today I collected some phi scans on our in-house HyPix for testing.

Unfortunately I don't have crystals with strong anomalous signals, so we cannot validate the handedness of the geometry. This is not a big concern for MX, but can be an issue for CX users trying to determine the absolute hand.

@graeme-winter @dagewa do you have such datasets? Otherwise I will ask my colleagues for metalloprotein crystals but this will take time.

dagewa commented 1 year ago

I don't have such datasets either.

biochem-fan commented 1 year ago

I managed to implement multi-axis support (https://github.com/dials/dxtbx/commits/FormatROD-multiaxis).

Using XRD-00093: Monoclinic lysozyme from a Rigaku HyPix 6000 detector:

$ dials.import ../images/lysozyme1_*_???.rodhypix
$ dials.find_spots nproc=10 imported.expt
$ dials.index -vvv imported.expt  strong.refl

+------------+-------------+---------------+-------------+
|   Imageset |   # indexed |   # unindexed | % indexed   |
|------------+-------------+---------------+-------------|
|          0 |        4871 |           371 | 92.9%       |
|          1 |        4834 |           665 | 87.9%       |
|          2 |        5021 |           620 | 89.0%       |
|          3 |        4746 |           592 | 88.9%       |
|          4 |        5417 |           369 | 93.6%       |
|          5 |        8991 |           864 | 91.2%       |
|          6 |        5598 |           635 | 89.8%       |
|          7 |        7236 |           974 | 88.1%       |
|          8 |        7838 |           715 | 91.6%       |
|          9 |        3004 |           443 | 87.1%       |
|         10 |        7693 |          1225 | 86.3%       |
|         11 |        7829 |           895 | 89.7%       |
|         12 |        8004 |          2750 | 74.4%       |
|         13 |        6703 |           551 | 92.4%       |
+------------+-------------+---------------+-------------+

$ dials.refine scan_varying=True indexed.{expt,refl}

        RMSDs by experiment:
        +-------+--------+----------+----------+------------+
        |   Exp |   Nref |   RMSD_X |   RMSD_Y |     RMSD_Z |
        |    id |        |     (px) |     (px) |   (images) |
        |-------+--------+----------+----------+------------|
        |     0 |   3922 |  0.16554 |  0.26649 |    0.16257 |
        |     1 |   3102 |  0.195   |  0.39311 |    0.17853 |
        |     2 |   3688 |  0.20807 |  0.26579 |    0.15118 |
        |     3 |   3658 |  0.21677 |  0.28764 |    0.14084 |
        |     4 |   4188 |  0.1973  |  0.1407  |    0.16355 |
        |     5 |   6985 |  0.17374 |  0.15921 |    0.16824 |
        |     6 |   3639 |  0.21255 |  0.18737 |    0.13838 |
        |     7 |   5293 |  0.20708 |  0.18756 |    0.15465 |
        |     8 |   6282 |  0.17058 |  0.16768 |    0.14748 |
        |     9 |   2200 |  0.1325  |  0.15129 |    0.13542 |
        |    10 |   5977 |  0.26666 |  0.18915 |    0.17955 |
        |    11 |   5763 |  0.25494 |  0.17802 |    0.12616 |
        |    12 |   5879 |  0.27901 |  0.27849 |    0.24842 |
        |    13 |   4748 |  0.18019 |  0.22536 |    0.15581 |
        +-------+--------+----------+----------+------------+

I will test this on more datasets (including phi scans) next week.

biochem-fan commented 1 year ago

I confirmed phi scans can be processed.

I also confirmed that anomalous difference peaks on lysozyme sulfurs are positive. My understanding is that this validates the geometry's handedness. Of course ideally we should test on chiral small molecule crystals.

Note that these test are based on https://github.com/dials/dxtbx/commits/FormatROD-multiaxis. The version already merged to dxtbx might have the wrong hand (untested). I will send a pull request once I port the decompression routine to C++.

biochem-fan commented 1 year ago

I ported the decompression routine to C++ and it is now about 50x faster! https://github.com/dials/dxtbx/commit/89fc12d6394d9db7432459a785665f90b0c5fd74

I will test this on more datasets; some phi scans give high RMSDs so I have to check whether the problem is in my geometry or the dataset.

@graeme-winter @dagewa What is DIALS's policy about the endianness and the negative number representation? May I assume little endian and two's complement representation?

graeme-winter commented 1 year ago

I ported the decompression routine to C++ and it is now about 50x faster! dials@89fc12d

I will test this on more datasets; some phi scans give high RMSDs so I have to check whether the problem is in my geometry or the dataset.

@graeme-winter @dagewa What is DIALS's policy about the endianness and the negative number representation? May I assume little endian and two's complement representation?

Absolutely - every platform we currently support (i.e. x86-64 / ARMv8) matches these - and I can't imagine that things would migrate cleanly to e.g. a novel big endian architecture.

I am not aware of a platform over the last ~ 20 years which has used anything but 2's complement for -ve integer storing...

biochem-fan commented 1 year ago

I confirmed that I can get the right hand for L-cysteine and can perform Fe-SAD phasing of a metalloprotein.

Mission completed.