CosmiQ / solaris

CosmiQ Works Geospatial Machine Learning Analysis Toolkit
https://solaris.readthedocs.io
Apache License 2.0
414 stars 112 forks source link

Bug fixes, geojson2coco improvements, restrict_by_aoi, and nodata filling #331

Closed rbavery closed 4 years ago

rbavery commented 4 years ago

Description

This PR addresses the following issues:

https://github.com/CosmiQ/solaris/issues/327 by allowing an option to set areas outside an aoi_boundary to the src image's nodata value when tiling.

https://github.com/CosmiQ/solaris/issues/328 by adding a method to the RasterTiler class that fills tiles with the mean of the src image. If this is run after restrict_by_aoi, all nodata values outside of the aoi are filled, along with other nodata values in the image.

geojson2coco would fail if the geodataframe input had any MultiPolygon or GeometryCollection types. I added an optional argument to ignore these when converting the geodataframe to COCO format

There was a bug in the raster_tiler's use of the split_geom function, where the aoi boundary was not intersected with the src_image boundary prior to tiling, causing many nodata areas outside the src_img boundary to be tiled.

EDIT: Here's a list of all changes, from the CHANGELOG:

Changed

Fixed

Please delete options that are not relevant.

Checklist:

If your PR does not fulfill all of the requirements in the checklist above, that's OK! Just prepend [WIP] to the PR title until they are all satisfied. If you need help, @-mention a maintainer and/or add the Status: Help Needed label.

nrweir commented 4 years ago

Hi @rbavery,

Thanks for working on all of this stuff.

Just as a note, the CosmiQ team is currently working with counsel to ensure that solaris doesn't conflict with this new US export control guideline. I won't be able to merge PRs until we get the all-clear (hopefully in the next week or so). I'll post another comment here once that's clarified on our end and we can resume open source work.

nrweir commented 4 years ago

Another thought: since geopandas will be switching to pyproj.CRS objects for tracking CRSs in the 0.7.0 release (see here), I'm thinking these may end up being a better object to track CRS with through solaris rather than the rasterio ones. This is a pain given all the hard work you just put into implementing the rasterio ones though. Thoughts?

rbavery commented 4 years ago

Oof (wrt to that interim rule). No problem though.

I saw that update from geopandas, it's good news overall. I think it makes sense to for vectors to internally use pyproj.CRS. But rasterio uses it's own rasterio.crs.CRS class. I'm not sure if we can consistently use the same CRS object for both rasters and vectors at all points in the code. And it's unclear to me if reprojecting geopandas gdf's using a rasterio.crs.CRS will still work, so I've put out a question on twitter. But in any case where it makes sense to revamp the CRS handling again I can probably get to it.

nrweir commented 4 years ago

Yeah, that makes sense. As far as I can tell (from trying myself), geopandas gdfs can't handle rasterio CRS objects.

I'm about to start some work that will require me to get a bunch of stuff resolved (including CRSs in vector objects), and word on the street is that we'll be able to start pushing code up here again very soon. I'll create an issue for vector CRSs and assign myself to it if I start working on it.

rbavery commented 4 years ago

Sounds good. Joris, one of the geopandas maintainers, says that geopandas gdfs should be able to handle the rasterio crs class since it has a to_wkt method. https://twitter.com/jorisvdbossche/status/1227492555811504129

nrweir commented 4 years ago

Hey @rbavery. I spent a bunch of time digging into what works well and what doesn't for interconverting between CRSs, and I think I've decided that passing around pyproj CRS objects is going to work better. Check out this gist for an example. I'm going to start working towards passing everything around as pyproj objects.

rbavery commented 4 years ago

@nrweir Got it sounds good. I read the rasterio groups thread, it make sense to standardize on the pyproj class. I'll have time next week to work on wrapping up this PR with some tests. Would you prefer I wait and rebase off of your pyproj CRS updates? In any case, I'll remove this change I made in this PR since I think the missing CRS issue I was having might be fixed by your update.

... I wasn't able to save a geopandas geodataframe with a CRS that didn't have an EPSG code, which caused it to default to 4326. I added an optional argument to override the CRS in the geojson_to_px function with the CRS of the src image.

nrweir commented 4 years ago

@rbavery that'd be great (rebasing off the new version). I'm going to do a 0.2.2 release today.

codecov[bot] commented 4 years ago

Codecov Report

Merging #331 into dev will decrease coverage by 2.61%. The diff coverage is 80.88%.

Impacted file tree graph

@@            Coverage Diff             @@
##              dev     #331      +/-   ##
==========================================
- Coverage   67.17%   64.56%   -2.62%     
==========================================
  Files          72       72              
  Lines        5204     5328     +124     
==========================================
- Hits         3496     3440      -56     
- Misses       1708     1888     +180     
Impacted Files Coverage Δ
setup.py 0.00% <0.00%> (ø)
solaris/raster/image.py 53.00% <ø> (+0.05%) :arrow_up:
tests/test_nets/test_datagen.py 100.00% <ø> (ø)
solaris/vector/polygon.py 79.35% <45.45%> (-3.76%) :arrow_down:
solaris/eval/challenges.py 84.74% <50.00%> (-2.98%) :arrow_down:
solaris/data/coco.py 73.70% <62.50%> (-0.69%) :arrow_down:
solaris/tile/raster_tile.py 67.96% <65.62%> (-0.83%) :arrow_down:
solaris/tile/vector_tile.py 77.66% <75.00%> (-0.55%) :arrow_down:
solaris/utils/core.py 76.62% <88.88%> (+3.06%) :arrow_up:
solaris/utils/geo.py 70.19% <89.47%> (+3.53%) :arrow_up:
... and 15 more

Continue to review full report at Codecov.

Legend - Click here to learn more Δ = absolute <relative> (impact), ø = not affected, ? = missing data Powered by Codecov. Last update 94ac29c...b64a097. Read the comment docs.

rbavery commented 4 years ago

Hey @nrweir, thanks for the patience, I think this PR is finally ready for review! Instead of converting pyproj.CRs to rasterio.crs.CRS using epsg codes I followed @snowman2's advice and used this code snippet he suggested:

import rasterio
if LooseVersion(rasterio.__gdal_version__) >= LooseVersion("3.0.0"):
    rio_crs = rasterio.crs.CRS.from_wkt(pyproj_crs.to_wkt())
else:
    rio_crs = rasterio.crs.CRS.from_wkt(pyproj_crs.to_wkt("WKT1_GDAL"))

When I ran your gist with this if/else, it worked to properly convert where it failed before.

nrweir commented 4 years ago

@rbavery thanks, this looks awesome - I'll review today.

rbavery commented 4 years ago

Can you hold off on merging @nrweir ? I found that their is still an issue when using geojson_to_px_gdf, where if the geojsons were saved with a CRS with no EPSG code, the function fails to load any projection information. I need to add the override_crs option back in. I thought this would be solved by the other CRS updates we made, but it looks like geojsons may only be able to store EPSG code CRS information?

The vector_tiler maintains this CRS info right up until the gdfs are saved, so I think it is an issue with the geojson format.

rbavery commented 4 years ago

Ok, I think this should be good to go, if the way I addressed the reviews makes sense.