PyAbel / PyAbel

A python package for Abel and inverse Abel transforms
MIT License
95 stars 39 forks source link

code review `set_center()` in `abel/tools/center.py` #300

Closed stggh closed 3 years ago

stggh commented 4 years ago

296 correction to set_center() has exposed some odd, perhaps legacy, code lines in this function:

def set_center(data, origin, crop='maintain_size', axes=(0, 1), verbose=False, center=_deprecated):
  """
  :
   origin : tuple
        (row, column) coordinates of the image origin
  """
  :
    origin = list(origin)
    if origin[0] < 0:
        origin[0] += data.shape[0]
    if origin[1] < 0:
        origin[1] += data.shape[1]

Apart from the redundant list dtype change, origin values cannot be negative.

   :
        container = np.zeros((data.shape[0] + shift0, data.shape[1] + shift1), dtype=data.dtype)

        area = container[:, :]
        if shift0:
            if delta0 > 0:
                area = area[:-shift0, :]
            else:
                area = area[shift0:, :]
        if shift1:
            if delta1 > 0:
                area = area[:, :-shift1]
            else:
                area = area[:, shift1:]
        area[:, :] = data[:, :]
        data = container

The above code appears not to be used.

In the end set_center() simply calls scipy.ndimage.interpolation.shift(image, offset_from_geometric_center). I don't think the PyAbel code needs to be so complex.

This issue is a place holder to facilitate discussion, and the code change.

stggh commented 4 years ago

I think this code (minus header comments) does the job of image re-centering:

def set_center(data, origin, crop='maintain_size', axes=(0, 1), verbose=False,
               center=_deprecated):
    """
      :
    """
    if center is not _deprecated:
        _deprecate('abel.tools.center.set_center() '
                   'argument "center" is deprecated, use "origin" instead.')
        origin = center

    # [delrow, delcol] delta: geometric center - specified origin
    delta = np.array(data.shape) // 2 - np.array(origin)

    # apply recentering delta only to the specified axes
    if isinstance(axes, int):
        delta[1-axes] = 0

    # pad the image to prevent losing data when re-centered
    # round to nearest integer
    idelta = np.round(np.abs(delta)).astype(int)
    pad0 = (idelta[0], 0) if delta[0] < 0 else (0, idelta[0])
    pad1 = (idelta[1], 0) if delta[1] < 0 else (0, idelta[1])

    pad_data = np.pad(data, (pad0, pad1))

    # move geometric center to the specified origin
    centered_data = shift(pad_data, delta)

    if crop == 'valid_region':   # crop to region containing data
        r, c = centered_data.shape
        centered_data = centered_data[pad0[0]:r-pad0[1],
                                      pad1[0]:c-pad1[1]]

    return centered_data

It would be nice to be able to broadcast pad = (idelta[:], 0) if delta[:] < 0 else (0, idelta[:]) but the if test fails.

Still testing.

stggh commented 4 years ago

It turns out that the current center.py code is fine, it appears to do the correct thing.

code gist

image

My comment

The above code appears not to be used.

cannot be correct.

Hence. there is no need to delay merging PR #299. I will continue to examine my version. But for now I am fighting a cold. It's winter in OZ.

MikhailRyazanov commented 4 years ago

Apart from the redundant list dtype change, origin values cannot be negative.

Now they can:

Image coordinates are in the (row, column) format, consistent with NumPy array indexing, and negative values are interpreted as relative to the end of the corresponding axis. For example, (-1, 0) refers to the lower left corner (last row, 0th column).

The casting to list is thus needed/useful because if this origin is passed as a tuple (which looks more natural than a list), it's reduction to 0 ≤ y < height (origin[0] += data.shape[0]) would not work. Maybe I should have put a comment there...

stggh commented 4 years ago

Neat! Thanks, for the explanation.

MikhailRyazanov commented 4 years ago

(replying to this comment from #299)

Cropping is a real pain, in the sense that I do not fully comprehend the cropping options

As I understand, they are supposed to work like this: crop

Why not pad the image for all cases and then crop to a PyAbel friendly nearest odd-column size? In which case centered_data = centered_data[padr:-padr, padc:-padc] will do the right thing, no need for any crop-options.

I think, these different cropping options above make sense for different applications, so allowing the users to choose among them is a useful thing.

Regarding always padding and then cropping, the question is whether we should care about the efficiency (time and memory). For example, maintain_size can be done directly by scipy.ndimage.interpolation.shift() without any padding and cropping; valid_region does not need padding, and maintain_data does not need cropping. For whole-pixel shift, the latter two can be done, respectively, by pure cropping and padding, without scipy.ndimage.interpolation.shift() at all, but doing fractional shifts properly and efficiently is where it gets complicated.

stggh commented 4 years ago

A very nice clear diagram, which has sorted my confusion. It would be great to include the figure in the docs.

You have made good point about the execution time for scipy.ndimage.interpolation.shift(). E.g. a 4096x4096 images takes about 2 seconds to shift on my PC.

In [1]: import numpy as np                                                      

In [2]: from scipy.ndimage.interpolation import shift                           

In [3]: im = np.loadtxt('O2-ANU4096.txt.bz2')                                   

In [4]: %timeit shift(im, -0.5)                                                 
1.98 s ± 2.25 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

I agree in principle with your comment, integer-pixel shifts could be handled via crop/pad, but in reality, how often would the data-center be whole pixel shifted from the geometric center?

Also: Q: What happens for input data has an even number of columns, for crop="maintain_size"?

MikhailRyazanov commented 4 years ago

takes about 2 seconds to shift on my PC

It's interesting that in Python 2 (and older SciPy) scipy.ndimage.shift was slightly faster. :–)

how often would the data-center be whole pixel shifted from the geometric center?

I would say that most experimental images must be shifted by >1 pixel, but subpixel shifts are important only for very-very high-resolution data. Which must have been also circularized with subpixel accuracy, and its center found with subpixel accuracy (taking into account that not all find_origin methods can do that)... Since scipy.ndimage.shift seems to be ~2 orders of magnitude slower (even with integer shifts) than whole-pixels shifts by direct NumPy array manipulations, we might consider whether it's worth to have “fast” and “slow” (but more accurate) centering methods. But how slow are find_origin methods?

Q: What happens for input data has an even number of columns, for crop="maintain_size"?

The high-level function center_image() does have the odd_size argument, which should take care of this, so it's probably not very necessary to do anything special in the low-level set_center() (those who call it directly must understand what they are doing).

DanHickstein commented 4 years ago

It would be great to include the figure in the docs.

I agree!

we might consider whether it's worth to have “fast” and “slow” (but more accurate) centering methods. But how slow are find_origin methods?

We can speed scipy.ndimage.interpolation.shift up a little bit by specifying order=1, which should be completely accurate for integer shifts. Still, it's not nearly as fast. Even if the find_origin methods are slow, there is probably still a case where you want really fast integer shifts. For example, back in grad school I would process huge folders full of photoelectron distributions. I would find the center for only one image (it was always the same for that instrument). Then, I would process the whole folder of hundreds of images, centering them and Abel transforming them. So, having a fast centering algorithm is important for the overall workflow.

I agree that in many datasets, sub-pixel shifts are unnecessarily small and that integer pixel shifts are more practical. I suppose that I'm arguing that we should have both a fast and a slow method, even though it's clunkier to implement.

stggh commented 4 years ago

It's interesting that in Python 2 (and older SciPy) scipy.ndimage.shift was slightly faster. :–)

Is that for the O2-ANU4096.txt image size, not 2048? I used a larger image for my test.

(those who call it directly must understand what they are doing)

We can issue a warning.

process huge folders full of photoelectron distributions

Same here, but with each center determined. However, a few seconds is not much out of the total time for a full spectral analysis.

should have both a fast and a slow method

Sounds fine. In addition, in principle, the core of set_center() need only operate on one axis, with an image transpose .T used to operate on the other (for a 2D image data). This may help when using set_center() for 1D-data.

MikhailRyazanov commented 4 years ago

In fact, there is a good use case that needs fast shifts without calling find_origin for each frame — video processing (since we claim that PyAbel can do this).

I played a little bit with scipy.ndimage.shift() parameters, and with order=0 it apparently does whole-pixel shift, although still slower than NumPy slicing/padding. So, I guess, we can expose the interpolation order parameter, setting it to 3 by default for high-quality results, mention that lower orders are faster, and maybe reimplement order=0 ourselves to make it even faster. By the way, scipy.ndimage.shift() currently has broken boundary treatment.

(about even-sized images)

We can issue a warning.

I don't know. Warning does not make much sense, since passing the result further will either be fine or produce an error. If I understand correctly, the only Abel-transform method that can work with even-sized images is rbasex, but it does not need preliminary centering. So there is probably no need for set_center() to return even-sized images at all. On the other hand, center_image() does have the odd_size argument, so if there is any use for it there, maybe we can also add it to set_center().

in principle, the core of set_center() need only operate on one axis, with an image transpose .T used to operate on the other (for a 2D image data)

The problem with this is that scipy.ndimage.shift() is no faster when shifting along one axis only, so separating axes shifts will make it twice slower.

It's interesting that in Python 2 (and older SciPy) scipy.ndimage.shift was slightly faster. :–)

Is that for the O2-ANU4096.txt image size, not 2048? I used a larger image for my test.

For any sizes. For 4096×4096 I get ≲2.2 s in Python 2 (SciPy 1.2.2) and ~2.3 s in Python 3 (SciPy 1.4.1). Not a big, but consistent difference.

MikhailRyazanov commented 4 years ago

Well, I've made a whole-pixel version that uses only NumPy array manipulations and is 1–2 orders of magnitude faster than scipy.ndimage.shift() (for example, it takes ≲2 ms for 1025×1025 images, whereas shift() takes 120 ms with order=3 (default) and 14 ms with order=0).

In principle, if we can tolerate losing fractional edge pixels (which currently occurs in scipy.ndimage.shift() anyway and in any case must have zero intensity for meaningful Abel transforms), this implementation can be easily augmented with subpixel (⩽0.5 px) shifts for order > 0. Otherwise, doing proper fractional shifts (with circumventing scipy.ndimage.shift() bugs) will take some more effort.

@stggh, your code above does not seem to treat the crop='maintain_size' case, but the examples below it look correct (except that the “maintain_data” result is even-sized). Is that code incomplete? Do you have the full version and tests?

stggh commented 4 years ago

Yes, sorry, the code is complete. I have written another version that includes integer shifts, but it is not fully working for all the crop options. I have been hung up with the treatment of odd vs even image centers and cropping.

It seems that your slick code could be wrapped with a fall through to scipy.ndimage.shift() for fractional shifts.

Edit: code gist is not working correctly. Ignore.

stggh commented 4 years ago

@MikhailRyazanov your integer shift code looks very nice, and it works(*):

image

test code

(*) There may be a 1-pixel offset for crop='maintain_shape', which remains an even-image (400, 400). However, this is likely my fault in the test-code. :-(

MikhailRyazanov commented 4 years ago

For even-sized images it assumes that the “center” is at n/2 - 1:

012345
  ^

but your test assumes it at n/2:

012345
   ^

I do not know which is better: n/2 looks “simpler”, but n/2 - 1 allows to crop the image to odd-sized (by removing the last row/column) without changing its “center”, as described in the docstring. If we want to use the n/2 convention for even sizes, then I just need to remove - 1 from one line.

Current PyAbel code apparently uses n/2: https://github.com/PyAbel/PyAbel/blob/f110ba34ad1e8e6bc920c40b402a3963527064b2/abel/tools/center.py#L205 but for odd_size=True it crops the last column (before centering): https://github.com/PyAbel/PyAbel/blob/f110ba34ad1e8e6bc920c40b402a3963527064b2/abel/tools/center.py#L127-L129

stggh commented 4 years ago

Hmm, I think old_center needs to be float if an even column size image.

Perhaps, add the parameter odd_size to set_center()?

MikhailRyazanov commented 4 years ago

Hmm, I think old_center needs to be float if an even column size image.

But this will make the output image centered “between pixels”, at least for crop='maintain_size', contrary to PyAbel conventions.

Perhaps, add the parameter odd_size to set_center()?

Why not just call center_image(IM, origin, odd_size=True)? :–)

stggh commented 4 years ago

This is the standard PyAbel grid image image

Either set_center() must always return an odd-column image, or the center must be between pixels (even-columns)?

Why not just call center_image(IM, origin, odd_size=True)? :–)

Agreed. I thought that @Mobbybobby had called the function set_center() directly, but that may not be the case.

Your nice code corrects the previous errors in this function. I am happy with it, once you include a default to scipy.ndimage.shift() for fractional pixel shifts (= my user case).

MikhailRyazanov commented 4 years ago

After looking at other PyAbel tools, it seems that the all use size // 2 for the center, so I've switched to that. I think, using the center “between pixels” doesn't make sense, since nothing else in PyAbel can work with that...

I have created PR #302 with the new, hopefully complete and correct, code.

MikhailRyazanov commented 3 years ago

@stggh, should we close this now?

stggh commented 3 years ago

Yes, issue addressed by #302.