NASA-IMPACT / veda-backend

Backend services for VEDA
Other
11 stars 5 forks source link

Ingest ArcGIS collection into VEDA dev catalog #359

Closed smohiudd closed 3 months ago

smohiudd commented 5 months ago

What

In stead of using a proxy, an alternative idea is ingesting ArcGIS collections into our STAC catalog directly. We're already doing something similar with existing collections:

The collection would include meta data extracted from the the ArcGIS server and include a link to WMS or WMTS that could be used by the front end for visualization.

pyarc2stac may be the best way to do this however there other libraries that do similar things that should be looked at:

PI Objective

https://github.com/NASA-IMPACT/veda-architecture/issues/405

Success Criteria

smohiudd commented 5 months ago

I used the example notebook in pyarc2stac to create a collection ARCGIS_POWER_901_MONTHLY_RADIATION_UTC in dev: https://dev.openveda.cloud/api/stac/collections/ARCGIS_POWER_901_MONTHLY_RADIATION_UTC

@j08lue can you let me know if the UI team can work with this collection?

cc: @slesaad, @amarouane-ABDELHAK, @anayeaye

j08lue commented 5 months ago

I see the collection has the dashboard metadata that VEDA UI depends on ✅

image

It also has items but the list is empty.

image

I guess we will identify this data source type somehow and then the UI uses the WMS link instead.

image

I am sure we could easily adapt @slesaad's POC to try this out.

Would that be the next step, @smohiudd, or how do we evaluate this approach against the proxy one and others?

smohiudd commented 4 months ago

Here are some notes for discussion and differences between Proxy and direct ingestion.

Using Arc2Stac Proxy Directly Ingest Collections
Collections served up by proxy are always synced with ArcGIS. Summaries are always synced.

Collection may need to be re-ingested depending on the update frequency of the collection so summaries are in sync
To avoid high request hits to ArcGIS servers, we can employ a cacheing strategy for collections. Cacheing strategy not needed since the collections will be included in our STAC Catalog
More infrastructure to deploy and manage using Arc2stac. Simpler infrastructure strategy since we are using the existing STAC catalog and database
/statistics endpoint can retrieve timeseries data. This results in statistics data being served in a consistent way to the front end. No timeseries capabilities unless we access the ArcGIS getSamplesendpoint in the frontend.
j08lue commented 4 months ago

Great overview! We also discused the case of using a proxy for /statistics (to harmonize syntax), but not for /collections, right?

smohiudd commented 4 months ago

Yes I believe that was an option as well: ingest collections to STAC catalog along with a proxy just for statistics

smohiudd commented 4 months ago

Notes form the meeting on May 13 between VEDA Data Services and UI teams: