cityofaustin / atd-data-tech

Austin Transportation Data & Technology Services
17 stars 2 forks source link

[DevOps] Incremental updates in Moped > AGOL ETL #14479

Closed Charlie-Henry closed 2 months ago

Charlie-Henry commented 11 months ago

Currently, the Moped components ETL replaces then uploads all of the moped components. Now, there's 11,000+ components so it would be good to update this ETL to only upload the changed/new components to AGOL.

Resources https://github.com/cityofaustin/atd-knack-services/blob/production/README.md

johnclary commented 11 months ago

Thanks @Charlie-Henry for making this issue. Agree 100%.

I am setting this as blocked by https://github.com/cityofaustin/atd-data-tech/issues/14242. The current problem is that we don't reliably know when a Moped project (or any of it's many related records) was edited.

mddilley commented 4 months ago

check Knack services ETL

chiaberry commented 3 months ago

https://developers.arcgis.com/python/api-reference/arcgis.features.toc.html#featurelayer

johnclary commented 3 months ago

Here's what I think needs to be done to the current ETL:

  1. Enable the script to accept a last_run_date argument, and pass this argument to the Hasura query we use to fetch components.
  2. Modify the delete_features utility so that it can receive an array of component IDs that need to be deleted, and pass those component ids into the where statement of the AGOL API call.
  3. Update the Airflow DAG to pass the last_run_date value from it's context

One consideration is that I think we'll want to delete features in batches, because there must be some upper limit to how many component IDs we can pass in the where clause. I believe that's exactly what we do in the atd-knack-services code, even though we're using the Esri Python library in the atd-knack-services code, whereas in this ETL we are happily interacting directly with Esri's REST APi.