Automate Data Access in the Platform

I-GUIDE / CI_Platform

iGUIDE CI Platform Deployment

Apache License 2.0

0 stars 0 forks source link

Automate Data Access in the Platform #2

Open fbaig opened 4 months ago

fbaig commented 4 months ago

Problem One of the major goal of the platform is to streamline data search, access, exploration and analysis for geospatial workflows. Considering a dataset is available (via the catalog), design mechanisms to import it in the platform for further analysis. The import process by itself needs to be as transparent as possible from the users perspective with minimal interaction.

Considering the size, data can either can made available in the Jupyter environment directly or can be imported indirectly in the processing environment (HPC).

Potential Solution GeoEDF Connectors

Pull Request(s) ToDo ...

Data Catalog Components iguide-workshop-data-catalog-diagram

rkalyanapurdue commented 4 months ago

ToDo:

Identify representative examples of a large cloud optimized dataset that cannot be staged locally & a smaller dataset that can be staged locally in Jupyter.
Demonstrate how these two datasets can be discovered in the catalog and launched in the Jupyter environment.

rkalyanapurdue commented 4 months ago

Representative Examples:

Register WRFHydro merged monthly NetCDF output files, kerchunk indices, and metadata in catalog, evaluate metadata extractor on Anvil HPC
Register a notebook for working with the registered sample output data
Demonstrate running an Argo workflow to extract data for a given spatial region & variable selection

Longer term:

Extend to execution on HPC
Extend to datasets in OSN S3
Extend metadata extraction to Zarr datasets

rkalyanapurdue commented 4 months ago

Extending CUAHSI/Hydroshare catalog for I-GUIDE:

Have a separate S3 for I-GUIDE instead of CUAHSI S3
- Figure out how to deploy MinIO for JS2
Deploy Argo in JS2 for running simple subsetting operations
Develop security policies for I-GUIDE S3 buckets
- Currently we envision two kinds of buckets: private per user and public buckets
Federate CUAHSI KeyCloak to use I-GUIDE credentials for I-GUIDE specific security policies