cctbx / dxtbx

Diffraction Experiment Toolbox
BSD 3-Clause "New" or "Revised" License
2 stars 12 forks source link

Handle image templates that don't use zero padding #705

Closed dagewa closed 2 months ago

dagewa commented 4 months ago

Following the discussion in https://github.com/cctbx/dxtbx/issues/646, a template with a single # character is used to expand to all image files with non-zero-padded sequential numbers.

codecov[bot] commented 4 months ago

Codecov Report

Attention: Patch coverage is 86.11111% with 5 lines in your changes are missing coverage. Please review.

Project coverage is 42.26%. Comparing base (6818056) to head (fc5f8f4). Report is 1 commits behind head on main.

Additional details and impacted files ```diff @@ Coverage Diff @@ ## main #705 +/- ## ========================================== + Coverage 42.23% 42.26% +0.02% ========================================== Files 187 187 Lines 16752 16784 +32 Branches 3193 3201 +8 ========================================== + Hits 7075 7093 +18 - Misses 9031 9045 +14 Partials 646 646 ```
dagewa commented 3 months ago

Sorry, there's a bunch of extra commits in here. I think that's because I merged main from the wrong remote at some point :disappointed:

dagewa commented 3 months ago

Improved specificity further by ensuring the new pattern is only matched when the image serial number does not start with zero. All tests are passing for me, but this code is knotty because not only are we using regexes, but were using reversed regexes :scream:

dagewa commented 3 months ago

Finally got the CI all happy. Locally, dxtbx and DIALS tests all pass, as do xia2 tests run with pytest --regression -n auto.

Has anybody got any comments on this PR?

dagewa commented 3 months ago

I would ask that we comment in the change log how you would handle wanting to only open single digit files. I specifically wonder about wanting to dials.import template=foo_001#.cbf say which may not do what you expect (though, probably would)

I believe this case does do the right thing:

$ dials.import template=$(dials.data get -q insulin)/insulin_1_01#.img
DIALS (2018) Acta Cryst. D74, 85-97. https://doi.org/10.1107/S2059798317017235
DIALS 3.dev.1103-gea4fc09a8
The following parameters have been modified:

input {
  template = "/home/fcx32934/sw/cctbx/build/dials_data/insulin/insulin_1_01#.img"
}

--------------------------------------------------------------------------------
  format: <class 'dxtbx.format.FormatSMVADSCSN.FormatSMVADSCSN'>
  template: /home/fcx32934/sw/cctbx/build/dials_data/insulin/insulin_1_01#.img:0:9
  num images: 10
  sequences:
    still:    0
    sweep:    1
  num stills: 0
--------------------------------------------------------------------------------
Writing experiments to imported.expt

The dials.show imported.expt output shows that

Scan:
    number of images:   10
    image range:   {0,9}

Checking the image viewer shows that the first image is indeed the expected image. image

dagewa commented 3 months ago

Note to self: one thing to check is what happens with non-zero-padded series that do not start with a single digit. For example:

image_98.cbf
image_99.cbf
image_100.cbf
dagewa commented 3 months ago

Ok, seems to work fine. With files named insulin_1_{98..101}.img, implicitly:

$ dials.import insulin_1_*
DIALS (2018) Acta Cryst. D74, 85-97. https://doi.org/10.1107/S2059798317017235
DIALS 3.dev.1103-gea4fc09a8
The following parameters have been modified:

input {
  experiments = <image files>
}

--------------------------------------------------------------------------------
  format: <class 'dxtbx.format.FormatSMVADSCSN.FormatSMVADSCSN'>
  template: /tmp/insulin_1_##.img:98:101
  num images: 4
  sequences:
    still:    0
    sweep:    1
  num stills: 0
--------------------------------------------------------------------------------

and with an explicit template:

$ dials.import template=insulin_1_#.img
DIALS (2018) Acta Cryst. D74, 85-97. https://doi.org/10.1107/S2059798317017235
DIALS 3.dev.1103-gea4fc09a8
The following parameters have been modified:

input {
  template = "insulin_1_#.img"
}

--------------------------------------------------------------------------------
  format: <class 'dxtbx.format.FormatSMVADSCSN.FormatSMVADSCSN'>
  template: insulin_1_#.img:98:101
  num images: 4
  sequences:
    still:    0
    sweep:    1
  num stills: 0
--------------------------------------------------------------------------------

both produce a 4-image sweep.

dagewa commented 2 months ago

Added test was useful: unearthed an error on Windows, now fixed :+1: I believe this is ready to go in.