Workflows for Flood Inundation Mapping

jpolchlo commented 1 year ago

Overview

Following on from #101, we are in need of an ability to run the flood inundation mapping code from NOAA OWP. This PR takes a swing at this objective. I'm providing some Argo Workflow examples that can use an EFS volume to mount the required data directly to the filesystem. This works provisionally (at least the mechanism works, even if the code doesn't run all the way to completion at the time of filing this PR).

I've tested the FR dataset. The GMS dataset requires different steps and may need a more complex workflow to run it.

Checklist

[ ] Documentation updated if needed
[x] PR has a name that won't get you publicly shamed for vagueness

Notes

This has been tested on top of azavea/kubernetes-deployment#38
This has to run on a worker node; I ran it without the node-type: worker node selector and it took down a core node (which was surprisingly painful to restore)

Testing Instructions

Log into Argo Workflows
Start a new workflow using the contents of the fim-inundation.yaml
Adjust parameters if desired (and possibly adjust the workflow itself)
Create the workflow

jpolchlo commented 1 year ago

I have added a workflow that runs to completion on the inundation workflow. This uses a combination of EFS data and EBS scratch space to perform the work, and the result gets synced up to S3 after completion. The most recent iteration also provides a Dockerfile which is a modification of this one which adds in the s3fs-fuse utility. This fuse plugin does not work because it doesn't understand IRSA, and requires access keys and secrets. There exists some recent effort to fix this, but it's not ready, nor is it necessary. It's sufficient to chain a few commands together to do the required transfer to s3 after the process completes.

The potential benefit of the FUSE plugin is to obviate the need for an EFS volume, which is an additional cost on top of the S3 storage, which can potentially fall out of sync. I'm contributing the s3fs-enabled docker image as a historical artifact that would potentially be useful, and possibly soon. The downside of taking this route, however, is that containers using FUSE need to run in privileged mode, which may open vectors for misbehavior. A topic for later debate, perhaps.

jpolchlo commented 1 year ago

Opening this up for review. I do need to document how to use this in a README, but the content is as good as it's going to get.

azavea / noaa-hydro-data