astropy / imageutils

Image processing utilities for Astropy (No longer maintained)
https://imageutils.readthedocs.org/
9 stars 17 forks source link

NDData objects #13

Open crawfordsm opened 9 years ago

crawfordsm commented 9 years ago

Will imageutils tasks work on astropy.NDData objects (ie meaning that they will be able to handle bad pixel masks and uncertainty frames as well as just WCS) and not just ndarray objects?

I think this is a pretty critical aspect to make this package useful for other affliated packages. For example, we would have to re-wrap everything in here to work in the ccdproc affliated package.

Obviously, error handling can be difficult, but this aspect also would be what would make it truly useful for research.

cdeil commented 9 years ago

That's a tough question ... @keflavich, @larrybradley, @eteq and @astrofrog what do you think?

Personally I feel that it looks like the answer should be "yes, obviously imageutils should work for NDData". On the other hand it looks to me like NDData has been stuck in a not very useful state since it was created over a year ago ... I've never used it and don't see how using NDData would help with any of the analysis code I've written in Gammapy.

What would this mean in practice ... is there some boilerplate code we have to run on input / before output for every function that works with NDData? Or would each function be different and inspect the content of the NDData object before deciding what to do? Can someone point to a good example?

keflavich commented 9 years ago

@cdeil - I think imageutils should work on nddata objects. But, as you said, I have no idea what nddata obejcts are. Until they exist as objects we actually use - with a stable API - I don't think we can do anything about them.

crawfordsm commented 9 years ago

Background on NDData: Documentation for NDData is available here: http://docs.astropy.org/en/stable/nddata/index.html

It was actually created during the original planning meeting for astropy and included in the first release. The whole motivation for setting it up was so for an object describing astronomical data sets that could be shared between different afflicated packages. There are some developments with it that have stalled out such as WCS (was waiting on the astropy.WCS package to become a bit more stable) and uncertainty handling (examples exist, but correlated errors are a real headache), but both are still useufl in their present forms. There is also some question about whether quantity should be used, but I believe those are more development issue which shouldn't effect the API at all. There has been recent updates and inclusions in it driven recently by the ccdproc development.

There are pro's and con's to using it, but for at least data analysis, I think it is absolutely critical to have bad pixel maps and error analysis. By having those built in at the ground level, it makes it much easier to propagate them through your code and simpler to build up pipelines. I find it is a lot easier to work on a CCDData object where the data quality has already been checked in creating the object, then including those checks in every bit of code. On the other hand, it does mean including error propagation into your code which can be difficult to do right.

Examples: The ccdproc.ccddata class inherits from NDData object and the tasks in ccdproc were set up to all operate on a ccddata object, so if you want to see some examples of something using it, check out over there. For example: https://github.com/astropy/ccdproc/blob/master/ccdproc/core.py#L45

I also think the sunpy developers have also made use of NDData as well. So there are at least a couple of examples.

astrofrog commented 9 years ago

I think the reason that NDData has stagnated somewhat is because it turns out that it's really hard to generalize some of the things we've been trying to do, such as arithmetic and uncertainty propagation. I feel as though it would be easier if the base NDData class was simpler than it currently is, and actually didn't include arithmetic and uncertainty propagation, then it would have a simple API and would be easy to support.

Another difficulty with NDData and the generalized WCS is going to be - what if imageutils needs to change the WCS? How do we tweak an an arbitrary WCS? It's easy to say that the routines here should accept NDData, but at the same time I think it's going to be very challenging to deal with the WCS aspect properly.

Anyway, I'm +1 on trying to accept NDData, but I think that we still haven't properly 'solved' NDData itself. But now is the right time to be thinking about this! Maybe we can restrict imageutils to a subset of NDData, for instance a 2-d NDData sub-class that only takes astropy.wcs.WCS instances as the WCS object, or things like that. Trying to deal with the fully general NDData is going to be a nightmare.

astrofrog commented 9 years ago

In terms of sub-classes, we could also restrict ourselves to n-dimensional NDData objects that have astropy.wcs.WCS (the current FITS-WCS) instances as the .wcs property. When the new WCS comes along, we can then re-assess how easy it is to support that too.

TL;DR: I think we should not support NDData, but a more specific sub-class so that we can make assumptions about the WCS.

cdeil commented 9 years ago

At the top of the NDData docs is a note:

The NDData class is still under development, and support for WCS and units is not yet implemented.

As far as I can see there's not even an issue on Github that someone plans to do this for Astropy 1.0.

Until this changes I'd suggest we keep using ndarray and wcs objects directly in imageutils ... it should be easy to wrap any functionality we have to work with NDData or a sub-class once it contains WCS.

As a little anecdote, I've considered using NDData once when I tried to write a Maps class for gamma-ray astronomy, but then decided to subclass HDUList (which was also a mistake and this class is currently broken): https://github.com/gammapy/gammapy/blob/master/gammapy/background/maps.py#L20

The reason I didn't use NDData is because it doesn't support the data model for this application (multiple "data" images, Poisson statistics for the n_on and n_off images) ... I second @astrofrog's comments above that NDData shouldn't do arithmetic and error propagation .. the only reason it works well for CCDData is because it was written with the CCD data application in mind.

cdeil commented 9 years ago

If someone thinks using NDData inputs is feasible / helpful for image processing functions, you could try and use @astrofrog's reproject function as an example ... or @larrybradley's downsample and upsample functions that were merged into imageutils today (but don't show up in the online docs yet because of issue #14).

larrybradley commented 9 years ago

I think it would nice if imageutils functions could accept NDData as input, but it should not be required, especially since NDData is still incomplete/under development. The functions should also work with ndarrays, i.e. image inputs can be either ndarray or NDData.

crawfordsm commented 9 years ago

For example, the ccdproc.rebin similar to upsample works on both ndarrays and ccddata objects.
https://github.com/astropy/ccdproc/blob/master/ccdproc/core.py#L701

astrofrog commented 9 years ago

I have a suggestion which could work for now - how about 'loosely' supporting NDData via duck-typing, i.e. check if the data object passed has data and wcs attributes (and maybe mask) and if so, using those? Then it won't strictly require subclasses of NDData, but NDData-like objects.

In fact, maybe NDData should be an interface specification, not a base class, detailing what attributes should be available?

astrofrog commented 9 years ago

@crawfordsm - I don't understand the line rebin(nccd.uncertainty.array, newshape) in your example - surely this will only work if the uncertainty is a variance, not a standard deviation?

crawfordsm commented 9 years ago

Well, @astrofrog, this raises a point that yes, correct error propogation is hard. For example, the errors on those pixels is also correlated, and in reality, we need a way to have correlated error arrays. However, we have tried to take a stance of adding notes where the error propogation may not be correct (perhaps it should be warnings), but that could certainly be updated in the future for better error handling. The error handling is handling more correctly in some of the other tasks but I was just giving an example of something that works on both ndarray and ccddata. However, please do open an issue about it over at ccdproc.

larrybradley commented 9 years ago

@astrofrog I think they would need attributes for both a mask and an error image. I like the idea, but one question is what to call these attributes. In my functions, I've been explicitly clear and called them mask_image and error_image. NDData calls these mask and uncertainty. Are those stable names? (fwiw, I'm not sure I like uncertainty). I recall some discussion last year about using standard deviation or variance arrays.

perrygreenfield commented 9 years ago

My view, at least for now, is that NDData is useful as a standard container, but we shouldn't load it too much implied functionality. E.g., no arithmetic, no automatic error propagation, no automatic data quality propagation, etc. Slicing with an updated WCS would be a nice thing (and should be generally possible with gwcs I believe). There are just too many special cases to handle errors correctly. But providing error and data quality information is useful for those tasks that know what to do with it. How they use it certainly will depend on context.

astrofrog commented 9 years ago

@perrygreenfield - since there is already arithmetic in NDData, would you be in favor of moving it to a special sub-class?

@larrybradley - regarding uncertainty, the issue is that variance is only one simple way of describing errors, but the ideas is to have classes for more complex error models.

perrygreenfield commented 9 years ago

@astrofrog - yes, given that it is so hard to know what to do with errors, any arithmetic you do is not going to correctly deal with it in a significant fraction of cases, making it much less useful for fully general use.

crawfordsm commented 9 years ago

@cdeil I think the units have been integrated, so that should probably be updated in the docs, but it should be checked. The original hope was that the developers of affiliated packages would feed information back into NDData (different uncertainty types, error handling, wcs, etc) and about what might need to change which was done early on through specutils and then later through ccdproc.

@perrygreenfield @astrofrog Should we start a separate thread under NDData on astropy? I know that we changed some of the uncertainties behaviour for CCDData. In generally though, I think that NDData can still be used as a general container, but there probably should be certain functionality that should not be considered core to the object.

eteq commented 9 years ago

I agree with @perrygreenfield that we should treat NDData as a standard container. So basically the same as @astrofrog's suggestion of duck-typing, where we use meta, mask, and flags (if necessary). In cases where it makes sense, imageutils can always just check to see if uncertainty is something it understands, and if so, can update appropriately. But it's pretty clear now that that'll have to be sort of case-by-case.

@crawfordsm is probably right that there should be a separate discussion on how to change NDData. @astrofrog, perhaps you should just write a PR to do the sub-classing you're talking about? That should be pretty much just copy-and-paste, and will get the discussion going with something concrete. (FWIW, I like the idea of moving all the implementation of the error/operations stuff to a sub-class, but leaving the names like uncertainty defined in the base class.)

mwcraig commented 9 years ago

TL;DR : from the ccdprocangle it would be really nice if imageutils worked with CCDData objects, which are a very restricted subset of NDData objects. @eteq and @astrofrog's suggestion to duck type works.

I think NDData is long overdue for some attention and decisions about its future; an APE might generate a broader discussion than a PR that makes API changes.

Long version:

Sorry I'm coming late to the party...a few thoughts:

mwcraig commented 9 years ago

A doodle poll for a proposed google hangout next week to discuss NDData/imageutils, particular as they apply to ccdproc and photutils is below; github might be an usual way to get the link out but it seemed the easiest way to target those involved in the discussion already:

http://doodle.com/sznzmpui7fsxkwhb

@crawfordsm @cdeil @eteq @astrofrog @larrybradley @keflavich @perrygreenfield @ejeschke

Also relevant on these issues: #12 #17

ejeschke commented 9 years ago

I will second (third?) @astrofrog's suggestion for a duck-type like interface. I would love to be able to support nddata and ccdproc type objects directly in ginga, and it would not be hard at all if there is a minimal standard interface. Basically we need something that gives you a numpy like interface to the data, and a wcs-like interface to the wcs component--that's about it. I would suggest adding a couple of other useful things, including the ability to ask the object for a cutout of the data, etc. The object can encapsulate the details of errors, masks and propagation.

cdeil commented 9 years ago

OK ... so everyone's agreed that imageutils should support NDData objects eventually. And there are concrete proposals (and maybe even agreement) on several fundamental NDData changes.

Can we try to transform what is discussed here into a few new Github issues or pull requests? I think it would actually be useful to create those before the hangout next week so that we have a concrete list of proposals to talk about (and hopefully agree / decide on).

@mwcraig Could you create a Google docs document that links to all relevant issues and share a link here?

cdeil commented 9 years ago

I've started a API spec document for imageutils in #18 ... please comment and contribute.

eteq commented 9 years ago

@cdeil - are you thinking we should close this discussion and instead move it to #18 ? I'd prefer to avoid mixing the discussion across two threads...

mwcraig commented 9 years ago

Please see #18 for a link to the relevant issues; also please continue discussion there.

mwcraig commented 9 years ago

Hangout is set for Thu, August 14, 11AM CDT; invites sent to everyone who responded to the doodle poll (I hope). If you want an invite, let me know.

A very rough agenda is now part of https://docs.google.com/document/d/1yscBWegyQbG2fxmZFg5CAtnhS3Hoyl6VTSSApWAsKX8/edit?usp=sharing

cdeil commented 9 years ago

FYI: there's currently an interesting discussion on the scikit-image mailing list whether an Image class should be introduced that handles metadata and masks (see here).

mwcraig commented 9 years ago

@cdeil -- thanks for the link to the discussion. A couple reactions...

  1. Agreed that it would be a hassle (or impossible) for an imageutils function to handle arbitrary metadata or even, in all cases, a mask or WCS. The proposed decorator actually helps with that, I think, by essentially saying to users: "You have two options: 1) Pass in numpy arrays, then you (the user) are responsible for making sure the array you pass in is appropriate for the operation being performed, or 2) Pass in an NDData object and the folks who maintain NDData (not the folks who maintain imageutils) will insert an automatic check that the data you pass in is consistent with what the imageutils function can handle.".

As an example of this, the block_reduce in imageutils currently accepts as input either a numpy array or a numpy masked array and gives different results in the two cases (which makes sense to me). imageutils could catch that by explicitly checking its input to see if it is numpy.ma.array and raising an error. That check would presumably have to be included in each imageutils function that can handle an array but not a masked array.

The proposed decorator does that checking for you. To be explicit, there would be no expectation that any imageutils routine has to be able to handle masked data, wcs or any other property, just that if it does, it indicates that it does by having a named argument mask= or wcs= or whatever. The decorator would take care of checking an NDData object.

  1. I think this is the case of "higher-level libraries...to handle metadata correctly within their domain" (from the discussion you linked to, a couple posts down from yours). That is, what makes these tools imageutils as part of astropy is the domain-specific metadata we can expect to come with images (i.e. FITS headers). If NDData does nothing else (and it may well end up doing nothing else), it offers a container for holding data, metadata (which is often, but not always, a FITS header), and other domain-specific properties like WCS and units.

In ccdproc we were recently reminded of the need for something like NDData, whose metadata may be a FITS header but is not required to be. We had tried to force the CCDData to be an OrderedDict, which blew up the first time we converted a fits header with COMMENT cards and HIERARCH cards to a dictionary then back to a FIRS header. It broke the header badly, but we also didn't want to force the metadata to be a FITS header. It also didn't make sense to keep the metadata completely separate.

I've got some thoughts about changes to the decorator too but I'll make those on that issue.