agrc / palletjack

A library for updating AGOL data from various external sources
MIT License
12 stars 0 forks source link

Feat: replace geojson uploads with filegdb uploads #54

Closed jacobdadams closed 9 months ago

jacobdadams commented 9 months ago

I discovered a relatively new package called pyogrio that was built for the geopandas team to solve some of the dep problems with fiona/gdal. Turns out the version available on pypi has the new OpenFileGDB driver with write support built in, so now we've got all the pieces to upload gdbs to AGOL w/o arcpy.

GDB uploads are more efficient, and I've not found any documentation on file size limits. This avoids the geojson chunking requirements along with the projection limitations. Resulting operations should be much, much faster for large datasets.

I've waffled now on a couple releases on whether the FeatureServiceUpdater methods should be class methods or regular methods, but now it's making much more sense for them to be regular methods requiring instantiating the class first. This, plus the gdb upload change, plus the REST loader, will bump the version a full major number.

This PR strips out all the geojson upload cruft and replaces it with file gdb uploads. There may still be some residual stuff elsewhere that supported the goejson stuff, but I think I got the main stuff.

codecov[bot] commented 9 months ago

Codecov Report

All modified lines are covered by tests :white_check_mark:

Comparison is base (96fab9b) 91.85% compared to head (32c5ade) 93.28%.

Additional details and impacted files ```diff @@ Coverage Diff @@ ## main #54 +/- ## ========================================== + Coverage 91.85% 93.28% +1.43% ========================================== Files 7 7 Lines 1031 1058 +27 Branches 146 143 -3 ========================================== + Hits 947 987 +40 + Misses 74 62 -12 + Partials 10 9 -1 ``` | [Files](https://app.codecov.io/gh/agrc/palletjack/pull/54?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=agrc) | Coverage Δ | | |---|---|---| | [src/palletjack/load.py](https://app.codecov.io/gh/agrc/palletjack/pull/54?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=agrc#diff-c3JjL3BhbGxldGphY2svbG9hZC5weQ==) | `91.36% <100.00%> (+5.38%)` | :arrow_up: | | [src/palletjack/utils.py](https://app.codecov.io/gh/agrc/palletjack/pull/54?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=agrc#diff-c3JjL3BhbGxldGphY2svdXRpbHMucHk=) | `95.04% <100.00%> (+0.19%)` | :arrow_up: |

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.

jacobdadams commented 9 months ago

Ok, I think I've addressed and improved everything. I acknowledge the potential API instability, and given that I've run into a project where a single feature exceeds the GeoJSON upload limit, it's something we're going to have to live with and work around. I think the benefits outweigh the upkeep cost.