gauteh / roaring-landmask

A fast and memory-limited landmask based on GSHHG and OpenStreetMap for determing whether a point on Earth is on land or in the ocean.
22 stars 4 forks source link

Added OSM landmask as a new source #30

Closed TheSylex closed 1 month ago

TheSylex commented 2 months ago

Left the GSHHG landmask as the default, but I've added a new method in order to be able to choose between both GSHHG and OSM. It could also be possible to add any arbitrary number of providers but I didn't want to change the project too much, so I kept the changes minimal.

gauteh commented 2 months ago

Thank you. This seems very useful. The landmask is derived from the GSSHG shapes, with the OSM shapes the mask probably no longer matches the shapes. So if a point is in the ocean in the mask, but not in OSM, it may not be picked up. I think we have to generate a separate mask, or maybe just the difference to save space (there is a limit on the size of distributed packages to pypi/conda/crates).

TheSylex commented 2 months ago

How did you derive the mask? I'm unsure on both how it works and how it's generated. I grabbed the shapefile straight out of the OSM website and made an script to convert it from a shapefile into a single multipolygon .wkb.

Would you be able to kindly explain how it works in order for me to understand it and be able finish this PR? Thanks!

gauteh commented 2 months ago

The mask was derived from a Python version, so it is not super easy: https://github.com/gauteh/roaring-landmask/tree/main/src/devel, I will give you a more detailed answer tomorrow.

gauteh commented 2 months ago

Roaring landmask is fast because it can first exclude a great number of points that are certain to be in the ocean. The mask is generated with a set resolution (https://github.com/gauteh/roaring-landmask/blob/main/src/mask.rs#L24). To generate the mask you need to check whether any of the pixels in the mask has any land in it: either just intersecting, completely covered, or contains an island that does not intersect. The mask is then converted to integers which are stored in the a roaring bitmap / treemap: https://github.com/gauteh/roaring-landmask/blob/main/src/mask.rs#L137.

When roaring landmask checks a point it first checks the mask, if the test is negative it is certain that the point is in the ocean and can return. If not, it needs to consult the shapes since the points may be on land or not.

gauteh commented 2 months ago

I was thinking earlier that this project may be of use for that: https://github.com/regionmask/regionmask (#18 )

TheSylex commented 2 months ago

@gauteh I've already resolved the conflicts with your latest changes. I think I've figured out how to correctly make the bitmask, and have correctly esparated both shapes and masks. I've added the scripts that are needed to generate both the landmasks and the bitmasks into the roaring_scripts.py file. Would you kindly check the PR out again? Thanks :)

gauteh commented 2 months ago

Awesome! Looking forward to seeing how you solved this, will take a look tomorrow.

gauteh commented 2 months ago

This looks very promising! I am missing a python test and benchmark: https://github.com/gauteh/roaring-landmask/blob/main/tests/test_roaring.py to see how the provider should be specified from python (or are you planning to do this in the next round?).

It would be useful to plot the two masks + difference between them to see how big the difference is, and it would also be a good test to see if anything stands out.

gauteh commented 2 months ago

There are a few errors in the the tests related to the provider argument: https://github.com/gauteh/roaring-landmask/actions/runs/10789004492/job/29921006432?pr=30#step:7:448. To run the tests requiring nightly rust you do:

cargo test -r --features nightly,static --verbose
TheSylex commented 2 months ago

I've fixed the tests, although I couldn't run them locally. I've also added some images depicting both landmasks better. And finally, I'll leave here an image with the difference between both landmasks:

difference

gauteh commented 2 months ago

That looks very promising, the difference looks as expected (no shift in coordinates). Some new errors in CI. You may need rust nightly to test.

TheSylex commented 2 months ago

🎉 tests are passing! Any next steps before merging?

gauteh commented 2 months ago

Great! Are any of the commits duplicating the datafiles? They should be squashed, to prevent the repo from becoming too big. I can try later if you are unsure how to do this. I think the Python bindings + tests need some updates, but we can do that in the next step.

gauteh commented 1 month ago

Merged! I fixed the python tests to now also testing OSM.

TheSylex commented 1 month ago

any estimate about when we'll get this into a pypi release?

gauteh commented 1 month ago

If all tests go through it should be on its way: https://github.com/gauteh/roaring-landmask/actions/runs/10808495500

TheSylex commented 1 month ago

Looks like there's two problems...

The first one with memory in the x86 version: rustc-LLVM ERROR: out of memory

and then something seemingly related to compiling geos and numpy deps:

      gcc -fno-strict-overflow -Wsign-compare -DNDEBUG -g -O3 -Wall -fPIC -I/tmp/pip-build-env-1ll5ewre/overlay/lib/python3.12/site-packages/numpy/_core/include -I/io/.venv/include -I/usr/include/python3.12 -c src/c_api.c -o build/temp.linux-x86_64-cpython-312/src/c_api.o
      error: command 'gcc' failed: No such file or directory

Not really sure why any of these are happening now but didn't happen before. Any insights on what you changed regarding the CI on this PR?

gauteh commented 1 month ago

Memory happens now and then. I had accidentally disabled Python test. The fails are not related to your code changes. I will try to fix.