Closed cholmes closed 4 weeks ago
This should have readme updates before merging... but I may just continue to work on it adding in more improvements from the list in #27
Curious when that would happen, since all the input will be sentinel 2
Yeah it won't unless someone reprojects into EPSG:4326 (maybe downloading from GEE or something). I think it is safe to remove! (the check is an instinct from other projects where sometimes our inputs are EPSG:4326 and sometimes local UTM)
I haven't done extensive testing on the performance of the area calculation, but when I did first add it in I definitely noticed. But overall it's not egregiously long, so may be worth it.
If it gets excessive then I think we can switch to a pyproj based approach-- I think I benchmarked fiona's transform and it was way slower but I can't find that.
Update: I remade it https://gist.github.com/calebrob6/2a2705027764d17fdbeb2b7a4cf4887c -- pyproj.Transformer is atleast ~2x faster.
But also, if we're already assuming that the inputs are going to be in the correct UTM projection, then it is probably fine to not reproject to 6933 (or anything else) to compute areas.
@cholmes these changes LGTM! I agree it would be nice to find a way to fill the holes - though sometimes there are real holes in a field, it would be nice to have the option (just like the area filtering).
I agree it would be nice to find a way to fill the holes - though sometimes there are real holes in a field, it would be nice to have the option (just like the area filtering).
Oh, didn't see this before my comment. Didn't realize there are valid 'holes in fields' use cases, but actually I guess that does make sense. I think I lean towards the on/off option that does Caleb's operation for now, and then ticket that we'd ideally like control just like in the 'simplify' case.
Update: I remade it https://gist.github.com/calebrob6/2a2705027764d17fdbeb2b7a4cf4887c -- pyproj.Transformer is atleast ~2x faster.
Oh cool, I'll try to incorporate the code.
But also, if we're already assuming that the inputs are going to be in the correct UTM projection, then it is probably fine to not reproject to 6933 (or anything else) to compute areas.
Ah, good to know - I wasn't sure, and this bit me in the past. I do think eventually we may want to handle other projections, so might be good to keep the code around for our future selves or future others.
You want to review this @m-mohr? I think we should just get it merged and not wait on the holes stuff - I made an issue (#39) for it.
Then you could likely take on #27 and that one.
Merging for now, this part of the CLI needs further work anyway. O(n³) complexity is not ideal... :-)
A few improvements mentioned in #27:
min_size
parameter to filter out all the little slivers that usually aren't real fields. Default value of 500, which is about 5 pixels of sentinel 2. Requires an area calculation with reprojection which does slow things down a good bit, but seems like it may be worth it as the results seem a good bit better.Can see the difference, some places there's lots of these little ones that are just like a pixel big.
Would be good to also figure out the places where there's a one or two pixel hole in the polygon, seems like there should be something in shapely for it but I don't know it off hand. It does feel like a more sophisticated polygonization algorithm could do better, but it's pretty cool that we can get some fairly reasonable results just from rasterio.
I haven't done extensive testing on the performance of the area calculation, but when I did first add it in I definitely noticed. But overall it's not egregiously long, so may be worth it. If it was just to add an attribute that not everyone cares about we could make it an option. But filtering out these tiny polygons seems pretty useful and for that I'm pretty sure you need the area calculation w/ reprojection (maybe could do a faster filtering out pass without projecting to 6933? But probably not worth the effort), so I lean towards just doing it for everything.
@calebrob6 - I still have your warning about values greater than 1 for simplification in (but commented out). Curious when that would happen, since all the input will be sentinel 2, in UTM, so seems like for now we could count on the units always being meters? I do think it could be interesting to evolve this to be a more general 'simplification of satellite imagery inference' thing, but that feels not immediate, and seems reasonable to just assume sentinel 2, and then if we start trying other imagery sources we can adjust as needed.