Global block model characterization test.

tbenthompson commented 2 years ago

Now that we have a seemingly functioning block closure and in polygon test, it would be nice to save to disk the correct polygons and station to block assignments. Then, we can add an automated test to reload those values and check that we can reproduce them.

This is a kind of test where we are just checking that the outputs aren't changing instead of saying whether those results are correct. The type of test is generally called either a "characterization test" or "golden master test" or "freeze test".

brendanjmeade commented 2 years ago

I like this idea. Should I create a subset of the global model so that it runs faster for this? Also, what are your thoughts on the best way to store these results, csv file? Something else?

tbenthompson commented 2 years ago

I think having two separate tests would be good:

Global model running just closure: It seems helpful to run global blocks closure because it tests a lot more edge cases than a smaller model. Also the global blocks closure runs in under thirty seconds so that seems fine for automating the test as part of the CI pipeline. Here, I think just saving both the segment to polygon and the station to polygon assignments would be sufficient to maintain working code.
North America running the full pipeline: In addition to the global closure results, it seems smart to have a test for the full pipeline constructing the full matrix of partials. Using a smaller model for this seems okay since, unlike block closure, none of the steps are fundamentally different for a global vs local model. I think just saving the full matrix would be the solution here. We might need to figure out something tricky in order to get the matrix small enough so that it's easy to set up the automated CI.
- If that does end up being a problem, one idea might be to only save a random 5% of the entries. It's unlikely that a bug would change just a single matrix entry. Much more likely that it causes an entire subblock of the matrix to change - say, the block partials. So, a 5% subsample of entries would still catch that.
- A different idea is to just use an even smaller problem with maybe 5-10 blocks and one subduction zone with TDEs. That'll test most of the things we care about while being nice and small and fast.

The storage format doesn't really matter since we're just loading back up to compare. I'd probably do it just with numpy: np.save and np.load. Then it gets straight into a numpy array without any extra headache. np.savez_compressed could be useful if you want to reduce the size of the matrix.

brendanjmeade commented 2 years ago

I like this a lot and will move towards it. The sampling approach for the matrices is a good idea too. Right now Github is telling me two things: 1) The Windows builds aren't working and 2) that I've used up my 3000 minutes of automated checking for the month and I don't get anymore till September 22nd. I'm not sure how to fix the first (I'm not sure that we need to test on Windows? What do you think?) and I think the second might be fixable with a credit card!

tbenthompson commented 2 years ago

1) The Windows builds aren't working.

It looks like the Windows problem is that the okada_wrapper installation is failing. I can try to fix that sometime soon.

2) that I've used up my 3000 minutes of automated checking for the month and I don't get anymore till September 22nd. I'm not sure how to fix the first (I'm not sure that we need to test on Windows? What do you think?) and I think the second might be fixable with a credit card!

This second problem is also fixable by making the celeri repository public. Any publicly accessible Github repo gets essentially infinite free usage of Github Actions.

brendanjmeade commented 2 years ago

Love the public repo fix. Done! Thanks for the suggestion!

brendanjmeade commented 2 years ago

As of commit: https://github.com/brendanjmeade/celeri/commit/da58fb1326d2ed7ce67e944eee1154e18984851e there is now a successful global block closure test. It tests against a concatenated list of all edge_idx indices.

tbenthompson commented 2 years ago

Yay!! Exciting.

brendanjmeade / celeri

Global block model characterization test. #16