Closed vlulla closed 1 year ago
It may be possible to mount S3 buckets as a volume and pass them into the pod that will run the FIM code. It's not exactly the recommended best practice, and if we're planning on doing additional development to the FIM code, we should just fix the file access to use the AWS SDK; but if the intent here is just to get a "proof of life" w.r.t. the provided code, then this might be an OK approach. Ultimately, this could be run through an Argo workflow with the required volume mounts.
After talking about this problem, it is likely that the best solution to make these data available to the execution environment will be to create a persistent volume claim that can be mounted to whichever pod runs the FIM workflow via Argo. To start this process, there are two primary steps:
argo
namespace, since it doesn't seem like Dask is required to execute this job.Once the PVC exists, it can be mounted by the Argo workflow that will execute the job. We may have to figure out how to ensure that the worker pod is placed in the same AZ as the EBS volume. We can work together to write the workflow so that the volume is placed where it needs to be and there are enough compute resources available to the pod.
This sounds great! We don't need both the FR and GMS 7z on the ebs while we are figuring this out. I believe that it might be easier to begin with FR (smaller of the 7z) dataset to figure out how to run this as argo workflow and then we can try the bigger 7z (for GMS) workflow later. And oh, we'll definitely need the BLE (Base Level/Flood Elevation) CSVs to be included in the ebs regardless of whether we use FR or GMS datasets.
This appears to be settled. I've used the EFS volume strategy in #124 and we can access the files we need. Not elegant, but it is a solution. Efforts on the more elegant approach via FSx for Lustre (see azavea/kubernetes#40) did not go to plan. Closing, but feel free to reopen.
Perusing the shell scripts listed in the inundation-mapping repo to see how fims are generated we see that the python scripts are designed expecting data to be located at
/data/
folder and the repo src to be located at the/foss_fim/src/
folder. Getting the python repo source into the k8s cluster appears to be straightforward (can be done by a custom Dockerfile) but how do we get the data into the k8s cluster? The data that we need is hosted on the ESIP s3 bucket (s3://noaa-nws-owp-fim/
). Additionally, we also needBLE
forecast files for generating fims.For our trial run of generating fim we were provided the
BLE
forecast files. And, from what I understand, theseBLE
forecast files are created by some proprietary procedure which is unavailable to us. So, in addition to using the cloud hosted hydrofabric (from ESIP) we will also have to save these ble csv forecast files, possibly in another s3 bucket, and modify the python (and/or bash) scripts so that these scripts can use the hydrofabric and these ble csv files together to generate the fim.Therefore, we have to figure out how to restructure the python scripts so that they can use data directly from s3 buckets (this was
item 5
in Fernando's email from 2022.08.30!) Once we have the modified python scripts I believe that running them on our k8s cluster ought to be straightforward.So, as I understand it, basically how do we get this
to work without using data stored locally?