Closed TheSylex closed 1 month ago
Thank you. This seems very useful. The landmask is derived from the GSSHG shapes, with the OSM shapes the mask probably no longer matches the shapes. So if a point is in the ocean in the mask, but not in OSM, it may not be picked up. I think we have to generate a separate mask, or maybe just the difference to save space (there is a limit on the size of distributed packages to pypi/conda/crates).
How did you derive the mask? I'm unsure on both how it works and how it's generated. I grabbed the shapefile straight out of the OSM website and made an script to convert it from a shapefile into a single multipolygon .wkb.
Would you be able to kindly explain how it works in order for me to understand it and be able finish this PR? Thanks!
The mask was derived from a Python version, so it is not super easy: https://github.com/gauteh/roaring-landmask/tree/main/src/devel, I will give you a more detailed answer tomorrow.
Roaring landmask is fast because it can first exclude a great number of points that are certain to be in the ocean. The mask is generated with a set resolution (https://github.com/gauteh/roaring-landmask/blob/main/src/mask.rs#L24). To generate the mask you need to check whether any of the pixels in the mask has any land in it: either just intersecting, completely covered, or contains an island that does not intersect. The mask is then converted to integers which are stored in the a roaring bitmap / treemap: https://github.com/gauteh/roaring-landmask/blob/main/src/mask.rs#L137.
When roaring landmask checks a point it first checks the mask, if the test is negative it is certain that the point is in the ocean and can return. If not, it needs to consult the shapes since the points may be on land or not.
I was thinking earlier that this project may be of use for that: https://github.com/regionmask/regionmask (#18 )
@gauteh I've already resolved the conflicts with your latest changes. I think I've figured out how to correctly make the bitmask, and have correctly esparated both shapes and masks. I've added the scripts that are needed to generate both the landmasks and the bitmasks into the roaring_scripts.py file. Would you kindly check the PR out again? Thanks :)
Awesome! Looking forward to seeing how you solved this, will take a look tomorrow.
This looks very promising! I am missing a python test and benchmark: https://github.com/gauteh/roaring-landmask/blob/main/tests/test_roaring.py to see how the provider should be specified from python (or are you planning to do this in the next round?).
It would be useful to plot the two masks + difference between them to see how big the difference is, and it would also be a good test to see if anything stands out.
There are a few errors in the the tests related to the provider argument: https://github.com/gauteh/roaring-landmask/actions/runs/10789004492/job/29921006432?pr=30#step:7:448. To run the tests requiring nightly rust you do:
cargo test -r --features nightly,static --verbose
I've fixed the tests, although I couldn't run them locally. I've also added some images depicting both landmasks better. And finally, I'll leave here an image with the difference between both landmasks:
That looks very promising, the difference looks as expected (no shift in coordinates). Some new errors in CI. You may need rust nightly to test.
🎉 tests are passing! Any next steps before merging?
Great! Are any of the commits duplicating the datafiles? They should be squashed, to prevent the repo from becoming too big. I can try later if you are unsure how to do this. I think the Python bindings + tests need some updates, but we can do that in the next step.
Merged! I fixed the python tests to now also testing OSM.
any estimate about when we'll get this into a pypi release?
If all tests go through it should be on its way: https://github.com/gauteh/roaring-landmask/actions/runs/10808495500
Looks like there's two problems...
The first one with memory in the x86 version:
rustc-LLVM ERROR: out of memory
and then something seemingly related to compiling geos and numpy deps:
gcc -fno-strict-overflow -Wsign-compare -DNDEBUG -g -O3 -Wall -fPIC -I/tmp/pip-build-env-1ll5ewre/overlay/lib/python3.12/site-packages/numpy/_core/include -I/io/.venv/include -I/usr/include/python3.12 -c src/c_api.c -o build/temp.linux-x86_64-cpython-312/src/c_api.o
error: command 'gcc' failed: No such file or directory
Not really sure why any of these are happening now but didn't happen before. Any insights on what you changed regarding the CI on this PR?
Memory happens now and then. I had accidentally disabled Python test. The fails are not related to your code changes. I will try to fix.
Left the GSHHG landmask as the default, but I've added a new method in order to be able to choose between both GSHHG and OSM. It could also be possible to add any arbitrary number of providers but I didn't want to change the project too much, so I kept the changes minimal.