azavea / noaa-hydro-data

NOAA Phase 2 Hydrological Data Processing
11 stars 3 forks source link

Provide a system for executing one-off jobs #109

Closed jpolchlo closed 1 year ago

jpolchlo commented 2 years ago

A facility that makes good sense to have is one that permits a user to execute jobs on the cluster in ad hoc fashion. This means that we can submit a job request without needing manual intervention; these jobs should be arbitrarily complex, with multiple, dependent stages passing results among one another.

The usual solution to this set of demands in the Kubernetes world is to use Argo Workflows. This system allows for jobs to be specified as YAML and submitted to a system via a web frontend (the Argo Server), with each step triggering the creation of containers running docker images. This frontend should be accessible via a convenient subdomain (in this case https://argo.noaa.azavea.com), and permit OAuth by any Azavea user.

This issue covers the installation and configuration of this service.

jpolchlo commented 2 years ago

I've been working through this installation in the Kubernetes repo. I have the Helm chart installing, but it is still not properly configured. TLS works for now with a temporary, self-signed cert. SSO is configured to use Google, but Argo is asking for a groups scope via OAuth which Google does not support (?). I'm currently seeking ways to either disable this requirement, or provide it via other means.

I've also been experiencing some issues where the Helm chart seems not to be properly updated after changes managed by Terraform. Still investigating that.

jpolchlo commented 2 years ago

The basic configuration is done (SSO works over HTTPS). The Argo workflow server is (intermittently, for now) available at https://argo.noaa.azavea.com, and can be accessed using the temporary self-signed certificate. Once I confirm that the system works, I'll generate an official cert for this subdomain. I also need to make sure that the changes to Cognito that were made to support using the same SSO facilities for multiple services still work for JupyterHub before merging these changes.