Proposal for new scaling/stretching API

astrofrog commented 10 years ago

About

@ejeschke @cdeil @larrybradley @ChrisBeamont - here is a proposal for a new API for the image stretching/scaling. I tried to write something inspired by @ejeschke's approach, but also trying to make it compatible with @ChrisBeaumont's approach in glue and my approach in APLpy. This is by no means complete in terms of different stretch functions, but at this point I just wanted to start a discussion on the API based on some code.

Approach

Following's @ejeschke's approach, I've separated the problem into (a) determining vmin/vmax intervals, and (b) stretching values in the range [0:1] to other values in the range [0:1]

Intervals

The interval classes can be initialized with zero or more options depending on the class, and for now have a single get_limits method that returns vmin and vmax:

In [1]: import numpy as np

In [2]: data = np.random.random((128))

In [3]: from imageutils.scaling import *

In [4]: i = ManualInterval(0.2, 0.8)

In [5]: i.get_limits(data)
Out[5]: (0.2, 0.8)

In [6]: i = MinMaxInterval()

In [7]: i.get_limits(data)
Out[7]: (0.0041587738556735365, 0.97469338355296631)

In [8]: i = PercentileInterval(95)

In [9]: i.get_limits(data)
Out[9]: (0.041784080796265728, 0.94854907673291367)

Of course we could also define __call__ on the base function so as to be able to do e.g. i(data) and get back values in the range [0:1].

Stretching

Stretching is also defined with classes, which take zero or more arguments, and then are used as functions to transform the values. The inverted() method takes no arguments and returns the inverse of the transform:

In [11]: data = np.array([0., 0.33, 0.66, 1.])

In [12]: s = LinearStretch()

In [13]: s(data)
Out[13]: array([ 0.  ,  0.33,  0.66,  1.  ])

In [14]: s = SqrtStretch()

In [15]: s(data)
Out[15]: array([ 0.        ,  0.57445626,  0.81240384,  1.        ])

In [16]: si = s.inverted()

In [17]: si(s(data))
Out[17]: array([ 0.  ,  0.33,  0.66,  1.  ])

Transformations can be chained:

In [18]: c = s + s

In [19]: c
Out[19]: <imageutils.scaling.stretch.CompositeStretch at 0x1052c7668>

In [20]: c(data)
Out[20]: array([ 0.        ,  0.75792893,  0.90133448,  1.        ])

And we can check that this is equivalent to x**0.25:

In [21]: p = PowerStretch(0.25)

In [22]: p(data)
Out[22]: array([ 0.        ,  0.75792893,  0.90133448,  1.        ])

There is also a ContrastBiasStretch stretch and inverse one which can be used for ds9-like behavior, and this can of course be chained with another transformation.

Normalization class

We can bring these two together to make a Normalize class for use with Matplotlib. At the moment one has to first get the vmin and vmax then initialize the normalization class:

import numpy as np
import matplotlib.pyplot as plt
from imageutils.scaling import ImageNormalize, SqrtStretch, PercentileInterval

data = np.arange(900).reshape((30, 30))

p = PercentileInterval(90.)
vmin, vmax = p.get_limits(data)

normalizer = ImageNormalize(vmin=vmin, vmax=vmax, stretch=SqrtStretch())

fig = plt.figure()
ax = fig.add_subplot(1,1,1)
im = ax.imshow(data, norm=normalizer, interpolation='nearest')
fig.colorbar(im)

which gives:

test

As you can see, the values are correctly placed on the colorbar, non-linearly.

We could also have the normalization class take the interval instance too, but the issue is that some intervals depend on the data, so we'd have to make it that when passing it to imshow, one has to do norm=normalizer(data). However, the issue is that the user can technically update the data without updating the normalization class, so things could end up in an inconsistent state. I think the current API is fine given that this is going to be mostly used in tools rather than directly by users.

Summary

This is by far not complete and needs tests, docs, etc. and more classes, but at this point I would be interested to know:

Do you think this is the right approach?
What else is needed before you can consider using something like this in your tool? The aim here really is to avoid duplication and I personally plan to switch APLpy to use this once it is complete, so I'm hoping that we can make it general enough so that glue and ginga can also switch

I should say that we can also still provide a functional interface for users if we want, but the point here is to generalize the backend.

ejeschke commented 10 years ago

@astrofrog, I like this a lot. I'm still trying to figure out how to get the link to the PR to play around with it, but it looks great. I really appreciate that you didn't use the word "scaling" in the actual class names to describe this, since that word is overloaded and is often confused with image (pixel) scaling. (I called it "distribution" (since you are esentially redistributing the values), but "stretch" is also reasonable.)

I need to check the performance, but I wouldn't hestiate to move to this framework if it covers the gamut of the distributions that I have coded for ginga. Have you thought about a histogram equalization stretch? I remember that being one of the slightly more convoluted ones and it might pay to look into implementing that in this new module for fixing the final api.

I'm doing contrast/bias in a different place in the image processing pipeline, but if this produces an identical result and performance is good I would also consider using this for that.

astrofrog commented 10 years ago

@ejeschke - you can install this version by cloning my repo (git@github.com:astrofrog/imageutils.git) and then going to the new-scaling branch.

I think having a histogram equalization stretch would be great, so I implemented one for fun - I think it does the right thing but feedback is welcome :) It did raise an interesting point that in this case the data needs to be passed in at initialization, and that's fair enough since the data itself determines the stretch.

ejeschke commented 10 years ago

@astrofrog, re: histogram equalization: right, that sounds right regarding needing the data handle. Regarding distributions like asinh and exponential/power, we should be sure to provide parameters like exponent and nonlinearity (see https://github.com/ejeschke/ginga/blob/master/ginga/ColorDist.py) for examples. I remember talking to robert lupton about this being one of the disappointing limitations of ds9, that you couldn't ajust the nonlinearity factor of the asinh distribution.

BTW, thanks for the link.

astrofrog commented 10 years ago

@ejeschke - I agree with including the exponent and non-linearity, I will use the equations that you define in ginga for consistency (and will try and see how they relate to those in APLpy which also include these extra parameters but I may have assumed slightly different forms for the equations).

ChrisBeaumont commented 10 years ago

This looks like a nice interface that I'd be willing to use. I share @ejeschke's comments about performance. I found I could speed up the end-to-end normalization by a factor of ~2 by using inplace operators instead of typical arithmetic notation (https://github.com/glue-viz/ds9norm/blob/master/ds9norm.py#L60, e.g.).

I'm also a little curious/worried about whether working with masked arrays hurts performance. I wrap results in MaskedArrays as a last step because of this, but I don't know if that's necessary.

Finally, I wonder if it's worth having a version of PercentileInterval that down-samples the input if it's really large. Otherwise, the sorting is really slow, and the end result is almost the same. Glue does this, and I think ds9 does too.

astrofrog commented 10 years ago

@ChrisBeaumont - for the Normalize class for Matplotlib, it seems Matplotlib requires a masked array, but I agree it might make sense to leave this until the very end. I'm 100% in favor of making sure performance is optimal, so I'll look into the inplace vs not. This should probably be an option?

astrofrog commented 10 years ago

I like the idea of a PercentileInterval option or version that downsamples the input.

astrofrog commented 10 years ago

@ChrisBeaumont - when you want to take a subset of data, are you more likely to place a limit on the number of samples, or to request a certain fraction of pixels?

astrofrog commented 10 years ago

Wow, inplace operations can indeed be 2-3 times faster. I'm thinking of adding an out option to the __call__ funtions on the stretches, to be consistent with Numpy's way of dealing with this.

astrofrog commented 10 years ago

@ChrisBeaumont - just for info, I've implemented one way to allow in-place stretching (using the out= keyword)

ChrisBeaumont commented 10 years ago

Looks good

For downsampling, I think a fixed number of pixels is best, since it keeps a bound on computation time.

On Friday, September 12, 2014, Thomas Robitaille notifications@github.com wrote:

@ChrisBeaumont https://github.com/ChrisBeaumont - just for info, I've implemented one way to allow in-place stretching (using the out= keyword)

— Reply to this email directly or view it on GitHub https://github.com/astropy/imageutils/pull/24#issuecomment-55390612.

Chris Beaumont Senior Software Engineer Harvard Center for Astrophysics 60 Garden Street, MS 42 Cambridge, MA 02138 chrisbeaumont.org

astrofrog commented 10 years ago

@ChrisBeaumont - would something like I just added be useful?

In [1]: from imageutils.scaling import PercentileInterval

In [6]: import numpy as np

In [7]: x = np.random.random(1e7)

In [8]: p = PercentileInterval(99.5)

In [9]: %timeit p.get_limits(x)
1 loops, best of 3: 2.77 s per loop

In [10]: p = PercentileInterval(99.5, n_samples=1000)

In [11]: %timeit p.get_limits(x)
1000 loops, best of 3: 362 µs per loop

Alternatively, this could be something that exists in the base class for all interval classes so that all can be sub-sampled (but I'm not sure if it's useful for other classes)

astrofrog commented 10 years ago

One option I've been pondering is to allow the interval classes to be chained to the stretch classes (without modifications to the current API, just in addition), so one could do:

f = PercentileInterval(99.5) + SqrtStretch()
f(data)

and get out data renormalized to 0 to 1 and then stretched. The stretches can already be chained, so it's just a matter of adding the interval classes to the same system.

astrofrog commented 10 years ago

In any case it seems that we agree on the approach here, and it's just a case of adding more stretches and interval classes?

@larrybradley - do you agree that this is a good way to go?

If we agree to move ahead, then I'll work on implementing more of the stretches that @ejeschke has in ginga.

cdeil commented 10 years ago

@astrofrog Nice!

@larrybradley should comment, but I think adding a functional interface on top would be useful, even if it only provides a subset of the functionality one can achieve by chaining the classes into a specific transformation.

Side note: when using this to plot images with matplotlib as shown above, I think for the colorbar we will often need better methods to locate ticks (I mean ticks on the colorbar, not the image axes) that what matplotlib provides. Do we need to consider this for the API here or is this a completely independent feature that can be implemented later?

astrofrog commented 10 years ago

@cdeil - the normalize class here should place the ticks in the correct location on the colorbar (taking into account non-linear stretch, etc), and I think that the matplotlib colorbar command allows you to specify which ticks to show. If I misunderstood, could you show an example of a figure that is problematic?

EDIT: just checked and fig.colorbar takes a ticks argument in case the default choice is not suitable.

cdeil commented 10 years ago

@astrofrog Yes, ticks and labels are correct already. What I mean is that if you choose e.g. a sqrt stretch as you did in the original description in this issue, it would be better to place ticks at 100, 200, 400, 800 instead of 100, 200, 300, 400, 500, ... or there will often be cases where the labels overlap.

So I was asking about a method to compute a clever set of values at which to put ticks on the colorbar (as e.g. ds9 does). This would probably be useful to Ginga, APLPy and Glue (or anyone making a matplotlib image with a colorbar with this stretching / scaling sometimes)

I can make an IPython notebook with some examples that show problematic cases if it's still not clear what I mean.

astrofrog commented 10 years ago

I see what you mean now, and I agree this would be useful! :)

astrofrog commented 10 years ago

I think the cleanest way to do this is to give the stretches the option to return a tick locator that understands the stretch, so one would then do:

fig.colorbar(im, ticks=SqrtStretch().locator)

astrofrog commented 10 years ago

@larrybradley - before I go ahead and work any more on this, could you review this so far and let me know what you think?

larrybradley commented 10 years ago

@astrofrog Yep, will let you know soon.

larrybradley commented 10 years ago

@astrofrog This looks nice!

larrybradley commented 10 years ago

It would also be nice to have some convenience functions like this: ndata = scale_image(data, scale='sqrt', percent=90.) (currently avaliable)

which effectively does this:

p = PercentileInterval(90.)
vmin, vmax = p.get_limits(data)
normalizer = ImageNormalize(vmin=vmin, vmax=vmax, stretch=SqrtStretch(), clip=True)
ndata2 = normalizer(data)

np.array_equal(ndata, ndata2)   # True

cdeil commented 10 years ago

@larrybradley If you implement the scale_image convenience function using the ImageNormalize class, then the user needs matplotlib. Are the use cases for scale_image outside plotting important enough that it should work without the matplotlib dependency?

astrofrog commented 10 years ago

@cdeil it should be easy to implement scale_image without using the ImageNormalize class - one simply needs to do e.g.

norm = PercentileInterval(90.) + SqrtStretch()
ndata2 = norm(data)

so I think we can make it work without matplotlib and without duplicating much.

astrofrog commented 10 years ago

@larrybradley - I've copied your scale_image function here and made it work with this infrastructure, apart from the arcsinh stretch which hasn't been implemented yet.

I'm not going to have much time to work on this in the next couple of weeks. I'm going to try and add some tests now, but @ejeschke, would you be interested in opening a pull request to my branch and implementing more stretches based on what you have in ginga?

ejeschke commented 10 years ago

@astrofrog, ok I've written the other stretches, but having some trouble trying to run the tests from the local directory. Here's the error I'm getting. How do I set my path to allow me to run these tests locally?

$ py.test Traceback (most recent call last): File "/usr/local/lib/python2.7/dist-packages/_pytest/config.py", line 543, in importconftest mod = conftestpath.pyimport() File "/usr/local/lib/python2.7/dist-packages/py/_path/local.py", line 608, in pyimport import(pkgpath.basename) File "/home/eric/Git/imageutils/imageutils/init.py", line 16, in from .sampling import * ImportError: No module named sampling ERROR: could not load /home/eric/Git/imageutils/imageutils/conftest.py

larrybradley commented 10 years ago

@ejeschke sampling contains Cython code that needs to be compiled before it can be imported from within the source directory. You can do this with an in-place build: ./setup.py build_ext --inplace.

But, the best way to run the tests is simply ./setup.py test.

astrofrog commented 10 years ago

@ejeschke - thanks! Indeed, the easiest way to run the tests is to simply do:

python setup.py test

If you want to restrict yourself to a certain test file, then:

python setup.py test -t imageutils/scaling/tests/test_my_file.py

ejeschke commented 10 years ago

Hmmm, neither of these works. "python setup.py test" runs tests, but not the new stretch ones in @astrofrog's branch (which is checked out). "python setup.py test -t imageutils/scaling/tests/test_stretch.py" gives: $ python setup.py test -t imageutils/scaling/tests/test_stretch.py running test running build running build_py running build_ext skipping 'imageutils/sampling.c' Cython extension (up-to-date) running build_scripts ============================= test session starts ============================== platform linux2 -- Python 2.7.6 -- pytest-2.5.1 ERROR: file not found: /tmp/imageutils-test-wP2gnX/lib.linux-x86_64-2.7/imageutils/scaling/tests/test_stretch.py

=============================== in 0.00 seconds ===============================

I'll figure it out by manually manipulating the PYTHONPATH.

astrofrog commented 10 years ago

@ejeschke what is the latest commit if you do git log? Does it match what's in this PR?

ejeschke commented 10 years ago

@astrofrog, no worries, I just temporarily changed the import statement to import from the package to finish my tests. @astrofrog, can you take a look at these new stretches from the "new-scaling" branch here (https://github.com/ejeschke/imageutils.git)? Edit and merge as you see fit.

astrofrog commented 10 years ago

@ejeschke - I've added your commit here, thanks!

astrofrog commented 10 years ago

@ejeschke - just a quick question about the log stretch - at the moment:

y = log(a*x + 1) / log(a)

doesn't actually return a value in the range 0 to 1, but 0 to a little over 1. Should the equation be:

y = log((a-1)*x + 1) / log(a)

?

astrofrog commented 10 years ago

Apart from the question of the log scale above being slightly wrong, this should be ready for a final review. I removed normalize_image and did not add an equivalent in scaling/ui.py because it's not clear that it adds much beyond scale_image but I can easily add it back if needed.

Otherwise, does this solution work for everyone?

@ChrisBeaumont - can this be used in glue in future? (once stable)

@ejeschke - is this ok to use in ginga?

From my side, this can be used in APLpy, so I'm happy with it.

cdeil commented 10 years ago

travis-ci currently fails because of matplotlib imports.

astrofrog commented 10 years ago

@cdeil - ok, I'll try and make it an optional dependency.

ejeschke commented 10 years ago

@astrofrog, the +1 is to avoid a domain error with a x=0 input. You could either clip the result, or alter slightly to avoid the issue (as you suggested). I think as long as the function has a range "very similar" in characteristic it should be fine.

astrofrog commented 10 years ago

@ejeschke - ok, I've adjusted a few of the stretches so they always return values in the range [0:1] exactly. In particular, the sinh stretch now only takes one argument, because the denominator is completely determined by the factor used in the numerator if we want to guarantee the mapping from/to [0:1]. Does this seem ok?

larrybradley commented 10 years ago

@astrofrog +1 for the modified stretches that always return [0:1] ranges.

I think the composite transforms, e.g.

norm = PercentileInterval(90.) + SqrtStretch()
ndata2 = norm(data)

really need a "Normalizer()" (to put the image in the [0, 1] range) in between the *Interval classes and the *Stretch classes. Something like this:

norm = PercentileInterval(90.) + Normalize() + SqrtStretch()
ndata2 = norm(data)

Otherwise, for example, SqrtStretch() could get negative values. Perhaps the Normalize() part, can be done automatically in between transform_1 and transform_2 in CompositeTransform().

larrybradley commented 10 years ago

I use the asinh stretch often, but I am puzzled about how AsinhStretch() is implemented here (and in ds9, Ginga, and ds9norm). Basically they all do this:

res = np.arcsinh(factor * values) / nonlinearity
result = np.clip(res, 0, 1)

with the default of factor=10, nonlinearity=3 (in ds9 these are fixed). This form doesn't make any sense to me and the subsequent clipping here effectively changes the defined min/max interval because res is not in the [0, 1] range.

I think the correct form should require one parameter, called the "softening parameter" b, and be of this form:

bnorm = (b - mincut) / (maxcut - mincut)
result = np.arcsinh(values / bnorm) / np.arcsinh(1. / bnorm)

where values are the normalized image values in the range [0, 1], and b is the "softening parameter" described in Lupton's paper. Note that because of the denominator, the result here is also always in the range [0, 1].

b must be normalized to bnorm in the same way as the data were put in the [0, 1] range. The meaning of b is that it is the data value at which the arcsinh curve transitions from a linear form to a log form. Generally one should choose a b around the 1-sigma noise level in the data.

nonlinearity doesn't appear in Lupton's paper, but I think it comes from an IDL implementation (http://www.idlcoyote.com/programs/asinhscl.pro; which I think is correct), where it means nonlinearity = 1 / scaled_beta and scaled_beta is the same as bnorm above.

So in terms of nonlinearity, it should be:

bnorm = (b - mincut) / (maxcut - mincut)
nonlinearity = 1. / bnorm
result = np.arcsinh(values * nonlinearity) / np.arcsinh(nonlinearity)

Can someone explain why this form isn't used?

larrybradley commented 10 years ago

For reference, this is what I currently have for asinh in imageutils: https://github.com/astropy/imageutils/blob/master/imageutils/scale_image.py#L207

astrofrog commented 10 years ago

@larrybradley - I agree with you and in fact I already changed this for sinh in my last commit, but just forgot to do it for asinh. Regarding the PercentileInterval(90.) + SqrtStretch() transforms, the interval class already deals with the normalization to the [0:1] range (though if you find cases where it doesn't let me know!). I'll work on implementing your comments tomorrow.

larrybradley commented 10 years ago

@astrofrog OK, great!

Regarding the PercentileInterval(90.) + SqrtStretch() transforms, the interval class already deals with the normalization to the [0:1] range (though if you find cases where it doesn't let me know!).

It's not currently working:

>>> from imageutils.scaling import *
>>> data = np.arange(5) - 2.
>>> norm = PercentileInterval(90.) + SqrtStretch()
>>> ndata2 = norm(data)
>>> ndata2
imageutils/scaling/stretch.py:44: RuntimeWarning: invalid value encountered in sqrt
  return np.sqrt(values)
array([        nan,         nan, -0.07856742,  0.70710678,  1.03254369])

BaseInterval.__call__() has a clip parameter, but it's not the default:

>>> PercentileInterval(90.)(data)
array([-0.05555556,  0.22222222,  0.5       ,  0.77777778,  1.05555556])
>>> PercentileInterval(90.)(data, clip=True)
array([ 0.        ,  0.22222222,  0.5       ,  0.77777778,  1.        ])

and it doesn't work with the composite class:

ndata2 = norm(data, clip=True)
TypeError: __call__() got an unexpected keyword argument 'clip'

astrofrog commented 10 years ago

@larrybradley - good point regarding the clipping, I'll see what I can do. In the mean time, asinh and sinh have been fixed. @ejeschke - are you ok with this change? (changing factor to 1/a - completely equivalent, just different formulation).

ejeschke commented 10 years ago

@astrofrog, again, as long as the function has a range "very similar" in characteristic it should be fine. However, since the effect of this change is to radically change the input parameters, I'd like to check with Robert about what he had in mind. I know in previous conversations he has lamented that ds9 had fixed the parameters and that the nonlinearity could not be adjusted.

At the very least, I think we now need to remove the comment that "This is the same as ds9's ASINH stretch, described at ...", because without the same number and domain of input parameters it is not he same, although the range is similar. Ditto for the SINH one.

larrybradley commented 10 years ago

@ejeschke In the revised version, the a parameter is the normalized softening parameter, or 1. / nonlinearity. In Ginga and elsewhere, I really think that the factor and nonlinearity parameters got swapped, and factor is unnecessary (factor = arcsinh(nonlinearity)). Note that arcsinh(10) = 2.998 ~ 3. I agree that the ds9 version is bad because there is not any adjustable parameter.

@astrofrog I would prefer that the parameter be the un-normalized softening parameter, but I can probably live with this. Here a is the percentage of the normalized image where the linear-log transition occurs (instead of the un-normalized image value). We should probably constrain a to be in the range 0 < a <= 1.

ejeschke commented 10 years ago

@larrybradley, I'm ok with the new version. Your explanation makes sense and it seems more clear to me. However, I think we need to be careful advertising something is "the same as" (IRAF, ds9, etc.) when the effect is nearly identical but the API is quite different. Or at least we need to be quite clear in our documentation how it is different, otherwise user's perusing the code and comparing with ds9, IRAF advertised APIs will not understand these subtleties and be confused.

larrybradley commented 10 years ago

@ejeschke Yep, I agree the docs shouldn't say "the same as ds9, etc."

astrofrog commented 10 years ago

@ejeschke @larrybradley - I've tidied up the docstrings, and put in LaTeX equations for all of these. I plan to add a section to the docs with a high-level overview of this, and as part of that I can have a paragraph relating these to ds9 and pointing out where they are the same and where they are different.

astropy / imageutils