Closed ranchodeluxe closed 2 months ago
@sunu: nice work on this. Saw that you had another repository up but I think we can just add this as another helm chart in /helm-chart
folder similiar to how 2i2c does it for things folks can install with jupyterhub: https://github.com/2i2c-org/infrastructure/tree/master/helm-charts/support
If that seems gross let me know and I can do it in the future 😉
@ranchodeluxe I put the ingestion pipeline stuff in a separate repository because I don't have a clear idea about how I want it to be packaged. Also, I'm not sure if it's good enough to be public and in the "official" repo yet.
There are 3 parts to the ingestion pipeline repo:
eoapi-ingest workflow submit workflows/maxar_opendata/workflow.yaml
.
helm-chart/
. Ideally, the python cli tool should be available for pip install
too.As a side note, @sharkinsspatial shared a recording of a previous eoapi pipeline discussion with me today. And watching that I learned quite a bit more about the specific pain points we are trying to solve and the current state of things. So I would love to discuss some of that and plan our next moves when we get the chance to catch up next time.
@ranchodeluxe I put the ingestion pipeline stuff in a separate repository because I don't have a clear idea about how I want it to be packaged. Also, I'm not sure if it's good enough to be public and in the "official" repo yet.
There are 3 parts to the ingestion pipeline repo:
1. Argo running in a k8s cluster - we can definitely include that here as a helm chart 2. A python cli to generate and submit argo workflows from a minimal workflow definition (https://github.com/developmentseed/eoapi-ingestion-argo/tree/main/ingest) through a command like: `eoapi-ingest workflow submit workflows/maxar_opendata/workflow.yaml`. * I am not sure whether this python tool belongs in this repo. If we include it here, it should probably live in a separate folder; not in `helm-chart/`. Ideally, the python cli tool should be available for `pip install` too. 3. Dataset specific workflow definitions - eg: https://github.com/developmentseed/eoapi-ingestion-argo/tree/main/workflows/maxar_opendata * this has 2 components: * the workflow definition: https://github.com/developmentseed/eoapi-ingestion-argo/blob/main/workflows/maxar_opendata/workflow.yaml * and optionally, custom dataset specific processing code to be injected into the pipeline: https://github.com/developmentseed/eoapi-ingestion-argo/tree/main/workflows/maxar_opendata/src * We can probably add some of these examples wherever we put the source code for the python tool?
As a side note, @sharkinsspatial shared a recording of a previous eoapi pipeline discussion with me today. And watching that I learned quite a bit more about the specific pain points we are trying to solve and the current state of things. So I would love to discuss some of that and plan our next moves when we get the chance to catch up next time.
Sounds good, I'll let you decide how you want to structure things.
But I guess I don't see any technical limitations to why all the things you mention above couldn't just live in a new chart (something like /helm-chart/eoapi-ingest-argo/
in this repo) next to a Chart.yml
file to support those requirements. The cli scripts could just be read from source and mounted as configmaps if needed or just live there to be executed. The 2i2c example does quite a bit of acrobatics with it's dependencies if you take a look at it
@ranchodeluxe ah, I think we both have a different deployment flow in mind for these ingestion jobs. Do you imagine each dataset will have its own helm chart and the ingestion jobs will be deployed through helm?
The deployment model I have in mind is somewhat different where each dataset has a custom docker image with all the scripts needed and the deployment is done through argo cli (or the python wrapper around it).
@ranchodeluxe I made a draft PR to test submitting ingestion jobs through helm: https://github.com/developmentseed/eoapi-k8s/pull/48. It works, but to me, submitting jobs through argo-cli feels a bit more natural than managing them via helm.
@ranchodeluxe I made a draft PR to test submitting ingestion jobs through helm: #48. It works, but to me, submitting jobs through argo-cli feels a bit more natural than managing them via helm.
Sorry for the confusion @sunu. I wasn't talking about "submitting jobs" per se but more about deployment stuff as your last update is referencing. Do what feels best to you for submittal
@ranchodeluxe ah, I think we both have a different deployment flow in mind for these ingestion jobs. Do you imagine each dataset will have its own helm chart and the ingestion jobs will be deployed through helm?
The deployment model I have in mind is somewhat different where each dataset has a custom docker image with all the scripts needed and the deployment is done through argo cli (or the python wrapper around it).
I'm fine following your proposed way @sunu. I'm just trying to limit one thousand repos from blooming 👍
Background
Async containerized job platforms offer decent UI/UX about jobs, log access and IDP auth. Goal of this ticket is use ArgoWorkflows to develop a couple easy ingestion workflows and documentation.
pypgstac
CLIAC: