brendanjmeade / celeri

Next generation earthquake cycle kinematics
BSD 3-Clause "New" or "Revised" License
24 stars 6 forks source link

Stations being assigned to the wrong blocks!!! #84

Closed brendanjmeade closed 2 years ago

brendanjmeade commented 2 years ago

For the Japan model, GPS stations are being assigned to the wrong blocks in some cases. This is visually documented in the file: notebooks/celeri_direct_dense_japan_debug.ipynb in the commit: https://github.com/brendanjmeade/celeri/commit/e4bd6427544280a8b27ac721772b1835e057f1a8

Run this notebook and the last two cells will produce figures that show the station misassignment.

Some notes/observations:

@tbenthompson @jploveless This seems to indicate that there is an issue related to assigning stations to the correct blocks after they have been properly closed. Perhaps in: https://github.com/brendanjmeade/celeri/blob/e4bd6427544280a8b27ac721772b1835e057f1a8/celeri/celeri_closure.py#L241

Maybe the "rest of the world" block is causing this problem and we should just get rid of it by introducing some segments that make a complete circuit around the globe (e.g. near the equator)?

tbenthompson commented 2 years ago

I'll take a look this week. Probably tomorrow if I have time. I'll aim to reduce it down to a reproducible bug with a single polygon and single point and see if that helps clarify!

On Mon, Feb 14, 2022 at 10:11 PM Brendan Meade @.***> wrote:

Assigned #84 https://github.com/brendanjmeade/celeri/issues/84 to @tbenthompson https://github.com/tbenthompson.

— Reply to this email directly, view it on GitHub https://github.com/brendanjmeade/celeri/issues/84#event-6071672196, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABALTE7KW655PTISJIHVG3DU3G75FANCNFSM5ONHZWFQ . You are receiving this because you were assigned.Message ID: @.***>

brendanjmeade commented 2 years ago

Thanks @tbenthompson! If you like, I'm available on Zoom tomorrow after 1:30. Also, if the best answer is to get rid of the "rest of the world"/"exterior" block that would be a great solution too!

jploveless commented 2 years ago

This does indeed seem like an issue with the exterior block. In the Japan model, it is labeled as block 6, and when it is circulated to determine which points (stations) lie within, it is overprinting the block labels of stations assigned to blocks with labels ≤ 5. In Blocks, we somehow had a way of circulating around the exterior block in the opposite direction, so that it truly encompassed the rest of the world, rather than encompassing all of the block geometry that lies within.

One simple solution could be to just rearrange the block labels so that the largest-area block gets placed first in the list and therefore can't overprint any previous block labels for the stations. If there were any real stations on that block, they'd have to be assigned a label differently than using the current "in polygon" function (i.e. any stations lacking a label after all blocks are circulated are assumed to be on the exterior block). I think that should be okay in practice, and I think that's how old versions of Blocks handled it.

brendanjmeade commented 2 years ago

Note's from text conversation with @jploveless

Jack Loveless: I think you’re right. It’s block 6, and it seems to be overwriting the station labels for all blocks < 6 So I think it’s that the exterior block is getting circulated in the opposite direction when determining points inside of it (i.e. the inside is truly the rest of the world, but currently, it is treating the detailed block geometry as inside) I can’t recall how we forced circulation in the opposite direction in Blocks

Brendan Meade: Thanks for this insight. That’s starting to make sense. Should we just abandon the rest of the world block and make it a rule that we have to have a block boundary that goes around the world? That might eliminate this problem?

Jack Loveless: I’m actually a bit confused about how the segments can be correctly labeled yet the stations are not. This makes me think that there could be a straightforward solution that would allow more flexibility. Just to maximize compatibility with existing Blocks geometries, I’d still like to find a solution that doesn’t require a world-spanning segment Specifically, we could ask whether a station has already been labeled. If it has, we shouldn’t relabel it when we are traversing the exterior block

brendanjmeade commented 2 years ago

@jploveless and @tbenthompson I brought this up with Jack but I'm wondering if we should just take the step to eliminate the "rest of the world"/"exterior" block? If we do this the entire problem should go away and we've eliminated an awkward concept that's been troubling (one way or another) for nearly 20 years. Thoughts?

tbenthompson commented 2 years ago

Sorry I didn’t get around to this today. But I just wanted to weigh in and say that I don’t think there is an exterior block. I think I got rid of that concept months ago when I wrote the new block closure. On Feb 15, 2022, 9:54 PM -0500, Brendan Meade @.***>, wrote:

@jploveless and @tbenthompson I brought this up with Jack but I'm wondering if we should just take the step to eliminate the "rest of the world"/"exterior" block? If we do this the entire problem should go away and we've eliminated an awkward concept that's been troubling (one way or another) for nearly 20 years. Thoughts? — Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you were mentioned.Message ID: @.***>

tbenthompson commented 2 years ago

Oh and the implication of this comment was just that I think there’s probably a simple bug.  Restructuring the conceptual workings of the closure and in polygon tests seems unnecessary. We just need to do a little bit of debugging. On Feb 15, 2022, 9:57 PM -0500, Ben Thompson @.***>, wrote:

Sorry I didn’t get around to this today. But I just wanted to weigh in and say that I don’t think there is an exterior block. I think I got rid of that concept months ago when I wrote the new block closure. On Feb 15, 2022, 9:54 PM -0500, Brendan Meade @.***>, wrote:

@jploveless and @tbenthompson I brought this up with Jack but I'm wondering if we should just take the step to eliminate the "rest of the world"/"exterior" block? If we do this the entire problem should go away and we've eliminated an awkward concept that's been troubling (one way or another) for nearly 20 years. Thoughts? — Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you were mentioned.Message ID: @.***>

brendanjmeade commented 2 years ago

I thought it was gone for the case when we had segments that reached around the earth. In this case, it looks like block 6 is what I would have called the "exterior block" previously. Definitely great if this is debuggable at the code end but if there is a user solution (always have some set of segments that spans 0-360) that's great too!

block_6_exterior_maybe
jploveless commented 2 years ago

I actually really like the idea of the "rest of the world" block because it has always reminded me that every model is a global model!

Ben's block closure and segment labeling algorithm is fantastic, but there is still an issue with assigning labels to stations and interior points. I think what's happening is that what should be an in-polygon test (in = truly the rest of the world) is actually an out-polygon test (in that the detailed block geometry/great majority of stations lie "outside" the rest of the world).

I think this can be solved with bookkeeping similar to how we do it in Blocks: never actually circulate the exterior block when testing which interior points/stations lie inside blocks, and then just assign the single unassigned interior point to have the exterior block as its label, and any unassigned stations to be on the exterior block. I will try to work on this tomorrow.

brendanjmeade commented 2 years ago

I actually really like the idea of the "rest of the world" block because it has always reminded me that every model is a global model!

Let me offer a counter suggestion: If we required each model (local as some may be) to have a set of segments that extended away from the local region and spanned the globe, that would help make explicit that every model is global and (possibly/probably) eliminate this issue and allow for the current implementation to work.

My current summary of our positions is:

@tbenthompson There is no exterior block
@jploveless There is an exterior block and it is good
@brendanjmeade There is an exterior block and we should get rid of it
tbenthompson commented 2 years ago

Haha, I love your table Brendan!

I'm up early so I figured I'd take a look at this. Jack: yell at me if I'm stepping on your toes!

I also should clarify my position: currently, there is no special case handling for an exterior block, but there is an exterior block. And I think this is actually why this bug is happening. We need some code in the contains_point/in_polygon handling that detects the exterior block and flips the results of the in-polygon test. One way to do this that requires very little manual intervention would be to just detect whether the segments circulate the block in a clockwise or counterclockwise direction.

The very simple code to replicate this bug is:

closure.polygons[6].contains_point(station.lon, station.lat)

That function call returns an array with True in every value. It should return False for every value.

tbenthompson commented 2 years ago

Two more notes:

  1. An even simpler way to check this is to test whether the interior_pt we compute in Polygon.__init__ is actually within the polygon according to the contains_point function.
  2. I'm actually suspicious that this is just a bug in the definition and creation of that interior_pt because that's already being used inside contains_point and should be defining whether we want to detect the inside or outside of the polygon.
tbenthompson commented 2 years ago

I'm getting a warning about a divide by zero issue in the contains_point algorithm. That seems like it might be pointing towards the problem:

[/Users/tbent/Dropbox/active/eq/celeri/celeri/celeri_closure.py:319](): RuntimeWarning: invalid value encountered in true_divide
  T = np.linalg.norm(T, axis=2)[:, :, None]
tbenthompson commented 2 years ago

There are lots of nan values in the T variable inside contains_point. Yikes!

jploveless commented 2 years ago

I'm up early so I figured I'd take a look at this. Jack: yell at me if I'm stepping on your toes!

Definitely not stepping on my toes if you have a good sense for how to reverse the circulation! I think I've read through your code enough to be able to hack together some exceptions for the exterior block but the reversed circulation seems more elegant/general. I think we can ID the exterior block either using an inpolygon test (contains all but one block.interior_lon, block.interior_lat) or by area, though I'm confused about some of the reported areas (2 are ≥11 steradians, and block 6 is not the largest).

tbenthompson commented 2 years ago

This is actually way simpler. My last hypothesis that interior_xyz is not being set correctly is the right one. See below for where I manually set interior_xyz to something more obviously in the exterior:

image
tbenthompson commented 2 years ago

I just pushed a fix for this issue. Only one character is meaningfully different. I changed 0.01 to 0.05 here: https://github.com/brendanjmeade/celeri/blob/5845504cf472fafac56b04ea6b5d9cd522a04d12/celeri/celeri_closure.py#L228

This isn't the best fix possible but it gets this particular example working. So, the fundamental problem here is that the contains_point requires a point that is inside the polygon. This helps to determine the difference between "inside" and "outside". The original implementation here was determining an interior point by finding any segment of the block and offsetting the midpoint along the normal vector into the interior of the polygon by a small amount. That small offset was not enough to make the contains_point algorithm sufficiently robust. Basically, the interior point was too close to the edge of the polygon for polygon 6 in this Japan example. So, by setting the offset multiplier to 0.05 instead of 0.01, I moved the point further into the polygon.

The problem with this is that is you offset too far along the normal vector, you run the risk of accidentally stepping across another edge into another polygon. I think we can robustly test for this possibility but it's going to be more than a one line fix. Basically, I'll draw a test-segment from the midpoint of the original edge to the proposed interior point and then check if that test-segment intersects with any other edges of the polygon.

jploveless commented 2 years ago

Thank you, Ben! This is funny: this seems very similar to some fixes/tests I was making to the inpolygonsphere code of Blocks to correctly work with some of Eileen's funky block geometries (move the test point a little farther, but not so far that it moves into the next block).

Looks like the block and station labeling are being done correctly from the visualization tests Brendan had set up, though the actual model parameter solution looks even worse than before! I will start taking a look for the culprit elsewhere…

tbenthompson commented 2 years ago

Thank you, Ben! This is funny: this seems very similar to some fixes/tests I was making to the inpolygonsphere code of Blocks to correctly work with some of Eileen's funky block geometries (move the test point a little farther, but not so far that it moves into the next block).

Uh oh, we're reliving the past!

I just pushed a PR that does something a bit smarter for choosing this interior point: https://github.com/brendanjmeade/celeri/pull/86 - I have a feeling that for all but the most insane block geometries, this new code will find good interior points.

jploveless commented 2 years ago

I like this approach of perturbing the interior point by an amount that scales with segment length. Seems like it should handle the big coarse blocks and the detailed blocks well.

I still have a question about a single station label. In the figures highlighting station labels, there is one station (~130ºE, ~70ºN) that is not identified as being on the exterior block or on block 4 (Eurasia). I remember this boundary from old Japan models causing headaches because it of course looks really different as a great circle than it does as a straight line. So I don't know which block this station lies on, but I feel like it should be either 4 or 6. This could be one of the cases where, after labeling stations, we assign any unlabeled stations to be on the exterior block, but I guess I'd like to understand why this happens in the first place?

Also, how nice that I can view a notebook directly in Github now!!

tbenthompson commented 2 years ago

Ooh, good detective-ing. I''ll look at that station. I agree that the algorithm we're currently using seems like it should always have exactly block containing a given point. So, if a point is not assigned to any block, that is an error!!

tbenthompson commented 2 years ago

I looked into the point that you mentioned above is failing to be included in any polygon. I understand why now and I have a fix I'm working on. The full in polygon test is working just fine and assigns that point to block #4. Which is correct because of the curvature of the great circle. But, the issue is in the bounding box check here: https://github.com/brendanjmeade/celeri/blob/5845504cf472fafac56b04ea6b5d9cd522a04d12/celeri/celeri_closure.py#L292-L301 This bounding box check is operating in lon-lat space and is not respecting the great circle shape of segments. So, in this particular case, it is incorrectly rejecting the point from block #4.

I think the quick fix is to use a 3D cartesian bounding box instead of lon-lat. This might be a lot simpler. We could turn off the bounding box test, but it makes large models much more pleasant to work with by making the contains_point function way faster.

tbenthompson commented 2 years ago

An easier fix might be to modify the bounding box construction to: 1) interpolate a few extra vertices along each "long" segment. These vertices would only live inside the BB construction step and wouldn't change the actual block geometry. 2) find the bounding box for the block with extra vertices. 3) expand that bounding box by a small percentage to account for any remaining great circle error.

But I'm actually just going to leave that as a note for later and turn off the bounding box check for now. We can fix it and re-enable it later on, but for now it's no problem to have slightly slower block closure and in polygon tests.

brendanjmeade commented 2 years ago

Closing for now.