dendrograms / astrodendro

Generate a dendrogram from a dataset
https://dendrograms.readthedocs.io/
Other
37 stars 38 forks source link

Fixing literal edge cases #124

Closed tomr-stargazer closed 10 years ago

tomr-stargazer commented 10 years ago

I added a complicated line of logic to SpatialBase._world_pos that tries to modulo the mom1 values by the data shape tuple (minus 1), and if it can't, it falls back to the original behavior and signature. I piggybacked shape_tuple into metadata when _make_catalog is called, which might be a little too clever of me since metadata is otherwise only used for user-defined parameters.

Those caveats aside, this completely fixes #122 for me -- all of my catalog properties are no longer nan and everything seems well-behaved, including the test case that I had to update (because it is now properly wrapping the x_cen values, as opposed to before, when it would let the x_cen value "spill over" to values higher than the data indices).

Is this acceptable to merge? I could write more test cases, especially since @ChrisBeaumont has suggested I focus more on unit tests rather than integration tests, in our conversation at #123.

tomr-stargazer commented 10 years ago

In writing unit tests, I realized that there's an off-by-one error in the (pixel-space) modulo after all. So this branch is not ready to merge.

ChrisBeaumont commented 10 years ago

Any chance #125 fixes the off-by-one error?

tomr-stargazer commented 10 years ago

The off-by-one error was introduced by me in commit a06578be4af37ff2f45ba2600894a4cfc064a366 and I am currently writing code to address it.

The problem lies in the tricky interaction between pixel-world and WCS-coordinates-world: I need wraparound structures to have x_cen values that are both mathematically correct (under some convention) and are accepted by WCS pix2world functions.

A structure like this [1 1 0 0 1] has a well-defined center at index 0, which is "well-behaved" in the sense of "we expect data to exist at that pixel".

But the following structure [1 1 0 1 1]

when properly wrapped its x_cen would either be -0.5 (half a pixel to the left of the bottom index) or 4.5 (half a pixel to the right of the top index). More broadly, there is a family of structures whose centers lie in the one-pixel "gap" above the rightmost index but below the leftmost index, and I found that it's these structures who have the most trouble in the WCS conversion, since pix2world throws "nan" when the indices exceed the index of the final "expected" pixel, but modulo-ing by the number of pixels also doesn't solve it because (if the last pixel is 1439, and the size is 1440, then mod 1440 doesn't catch a struct with center 1439.5, for example, but wcs still throws up).

So my current solution is to mod by the length of the data (e.g. 1440) and then crudely catch anything in the "gap", subtracting the data size and letting the x_cen be negative. WCS seems to prefer that.

ChrisBeaumont commented 10 years ago

Yeah, this issue is making me more nervous, since I think it's very specific to the particular WCS you're using. In many cases out-of-bounds pixel values are handled fine by WCS, but your particular files don't like them. (https://github.com/astrofrog/wcsaxes/issues/39 seems like a similar situation)

I worry about adding lots of special-case logic to astrodendro -- it almost seems like the solution is to modify your WCS metadata (or at least use a custom WCS subclass for your files, and stuff the wrapping logic there)

tomr-stargazer commented 10 years ago

That's a very good point. I had written a little function a few weeks ago that converted the out-of-the-box Dame header into something that played nicely with wcsaxes, so I'll use that header here, too, and hope that the WCS interaction gets solved. This issue has been tricky for me to work around because the whole domain of WCS is completely magic to me, but I hope this resolves it.

tomr-stargazer commented 10 years ago

Well, as it turns out, I have already been using the "improved and recentered" WCS header for these purposes, and when I go back to using the "out of the box" WCS header, I get hundreds of nan'd structures instead of just 3. The last thing I can think of is whether my function that "recenters" WCS headers might itself be suffering from an off-by-one pixel error. It's located here: https://github.com/tomr-stargazer/astrodendro_analysis/blob/master/demo.py#L56 and I'm currently investigating.

tomr-stargazer commented 10 years ago

After some informal testing I believe I am not suffering from an off-by-one error in my recentering code.

The custom WCS subclass with wrapping logic is looking more appealing the more I bash on this.

ChrisBeaumont commented 10 years ago

I think thats the way to go, to resolve any issues related to NaNs coming back from pix/world conversions. It isn't really astrodendro's concern

tomr-stargazer commented 10 years ago

I discovered the following method call today:

wcs_object.wcs.bounds_check(False)

If I call this on my wcs object before passing it to ppv_catalog, then I don't have to do any of the modulo subtraction magic and I get well-behaved, apparently correct coordinates in the catalog.

Given how much pain this caused me, I wonder if it's worth calling this line in astrodendro code - or perhaps at least adding a warning if it appears that the pix2world conversion throws nan? (To clarify, this warning would mention the bounds_check toggle as a possible solution)

If not, this pull request should be closed without merging.

ChrisBeaumont commented 10 years ago

For now, I think its too specific to worry about in the astrodendro code (most people will not compute dendrograms of periodic images). Let's see if it comes up again for someone else first. I'll close this for now