agrc / palletjack

A library for updating AGOL data from various external sources
https://agrc.github.io/palletjack/palletjack/
MIT License
12 stars 0 forks source link

Implement ServiceUpdater #101

Closed stdavis closed 2 months ago

stdavis commented 2 months ago

There is still a lot of duplicate code that could be extracted to a base class for both FeatureServiceUpdater and TableServiceUpdater. But this does work as is for the deq-eid skid.

@jacobadams - I'm interested to see what you think and your thoughts on where this needs to go from here.

Also, I made some black/ruff changes to get things working correctly in VSCode on my machine. I believe that they are in-line with the UGRC python project template. But feel free to set me straight if they are incorrect.

Closes #92

stdavis commented 2 months ago

Thanks for the review. I think that I'll try the base class solution first and see how it goes.

stdavis commented 2 months ago

@jacobdadams I'd love another general look at this. After DRYing up the differences between FeatureServiceUpdater and TableServiceUpdater, there was almost nothing left in the subclasses. So I just got rid of them in favor of a generic ServiceUpdater class that has a slightly different api. This would be a breaking change. If you are good with this direction, I'll finish updating the docs and make sure that the test converage is good.

I hope that I didn't go too far off of the rails with this. I realize that this is a major change and that you are the lead on this project. I'm totally open to any and all feedback.

In addition to making all of the existing unit tests work, I've also successfully tested all of the public methods via the deq-eid skid.

jacobdadams commented 2 months ago

Impressive! I'm all for it. I think it makes a ton of sense to generalize it. palletjack has already got a history of breaking changes, so I'm not too worried and this should be a relatively simple fix for scripts we want to update.

And no worries about "going off the rails"- I'm grateful you took this improvement on and are making it better. I think this is becoming the standard development model for palletjack- desired features are added when we've got funding in a project that needs them. I'm all for a standard, unified codebase that we can all use.

codecov[bot] commented 2 months ago

Codecov Report

Attention: Patch coverage is 98.07692% with 1 line in your changes missing coverage. Please review.

Project coverage is 94.70%. Comparing base (f87f4c5) to head (d223b13). Report is 29 commits behind head on main.

Files Patch % Lines
src/palletjack/load.py 97.72% 1 Missing :warning:
Additional details and impacted files ```diff @@ Coverage Diff @@ ## main #101 +/- ## ========================================== + Coverage 94.52% 94.70% +0.17% ========================================== Files 7 7 Lines 1133 1151 +18 Branches 148 177 +29 ========================================== + Hits 1071 1090 +19 Misses 52 52 + Partials 10 9 -1 ```

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.

stdavis commented 2 months ago

@jacobdadams This is ready for a serious look now. I went back and forth on the method renames. If you want to minimize the breaking changes, I'm happy to switch them back. The _features suffix seemed weird to me if you are working with a table. However, I noticed that the arcgis package does it so maybe it's not a big deal.

Here's what my functional test code looks like:

incidents_sdf = self._get_incidents()
incidents_loader = load.ServiceUpdater(
    self.gis,
    config.INCIDENTS_ITEMID,
    working_dir=self.tempdir_path,
)

#: testing only
incidents_add_count = incidents_loader.add(incidents_sdf[0:10])
incidents_remove_count = incidents_loader.remove([1, 2, 3])
incidents_live_sdf = (
    arcgis.features.FeatureLayer.fromitem(self.gis.content.get(config.INCIDENTS_ITEMID)).query().sdf
)
incidents_live_sdf = incidents_live_sdf[-10:]
incidents_live_sdf["Title_EventName"] = "Test Title Change"
incidents_live_sdf.drop(columns=["Date_Discovered_For_Filter"], inplace=True)
incidents_update_count = incidents_loader.update(incidents_live_sdf)

incidents_count = incidents_loader.truncate_and_load(incidents_sdf)

chemical_df = self._get_chemical()
chemical_loader = load.ServiceUpdater(
    self.gis,
    config.CHEMICAL_ITEMID,
    service_type="table",
    working_dir=self.tempdir_path,
)

#: testing only
chemical_add_count = chemical_loader.add(chemical_df[0:10])
chemical_remove_count = chemical_loader.remove([1, 2, 3])
chemical_live_df = arcgis.features.Table.fromitem(self.gis.content.get(config.CHEMICAL_ITEMID)).query().sdf
chemical_live_df = chemical_live_df[-10:]
chemical_live_df["Chemical_Name"] = "Test Name Change"
chemical_update_count = chemical_loader.update(chemical_live_df)

chemical_count = chemical_loader.truncate_and_load(chemical_df)

media_df = self._get_media()
media_loader = load.ServiceUpdater(
    self.gis,
    config.MEDIA_ITEMID,
    service_type="table",
    working_dir=self.tempdir_path,
)
media_count = media_loader.truncate_and_load(media_df)

I also cleaned up the commit history and tried to clean up the diff for load.py by preserving the order of the methods.

It might be cool to add some functional tests to this project that actually update dedicated test items in AGOL. Maybe not great to run in an automated way, but it might be a nice sanity check to see if everything works together. Just an idea.

stdavis commented 2 months ago

One request: I noticed you moved the types from the docstrings to the signatures for full type hinting. I love the type hinting, but pdoc3 relies on having the type info as part of the docstring, and all the other classes/modules follow this pattern. Could you please restore the types to the docstrings in addition to the method signatures?

Ah, that's why you had them duplicated. It's a shame that their can't be a single source of truth for them. I did find one instance where there was a discrepancy between the docstrings and the type hinting. I'll add them back...

stdavis commented 2 months ago

@jacobdadams I added the types back to the docstrings. Let me know if there is anything else that I can do. Thanks for the review!