[13pt] Create sample inputs ESIP directory to work against two HUCS

RobHanna-NOAA commented 1 year ago

in ESIP, we have a full inputs directory copy. However, we need a smaller "sample" version of the inputs folder as well which will give a user just the files they need to process a couple pre-defined HUCs in a demo mode. The entire folder structure of the new sample input directory will need to be identical so no code changes are required.

We need to review the code and isolate which files are needed for those two HUCs. The current readme.md does not appear to define which two HUCs but might be in other wiki pages. The readme.md will also need to be updates to reflect the two options of a full download or the smaller sample download.

During some testing of this concept, we found we can keep just a couple of HUC6 dem files in the inputs/3dep_dems/10m_5070, versus keeping all of the DEMs. By keeping just a couple of HUC6 dem files, we got the total input foldre down from 400gb to 50gb. The sample input folder needs to keep most of the other folders. Out code looks for a specific vrt, fim_seamless_3dep_dem_10m_5070.vrt, for the DEMs. The default VRT will fail if not edited as it is looking for the dem defined in each "ComplexSource" nodes. There is one "ComplexNode" per DEM. Just delete all of the "ComplexSource" nodes in that file but keep on the nodes that match the selected HUC6 (for the selected HUC8) and it works fine.

Suggested overall output: Readme

Tell the user about two choices for inputs 1) download the ESIP inputs folder to work with the full CONUS domain (and hawaii, PR) 2) download the ESIP sample_inputs folder to work with HUC8 x and y (we will pre-define those hucs. Maybe 05030104 (the smallest) and maybe our fav 12090301. When they download the sample inputs, make sure it is in their "data" directory and is renamed to "inputs". Yes. yuk. but our code mandiates the pathing of /inputs. The inputs dir is not configurable without code changes.

Considering volume stuff.. we might want to consider pulling some of the README.md and into INSTALL.md (or maybe a new .md). Then have README.md reference the other .md files as needed. The README.md is getting pretty big and will grow again with these notes.

Also.. we might want to add notes in the md to talk about inundation.py and mosiac_inunation.py as an external user did ask about it recently.

Update:

We should really change it so it has two separate usage patterns.
1. if you are just evaluating HUC outputs, here is a coulple HUC paths (from 4.3.11.0) that you can download. You will see the rems and hydrotables for branch zero and the branches (then yes.. links to what branches are).
2. If you are wanting to process your own HUCs or AOI's, then you need to setup your enviro and go on to setting up docker, code, etc.

Also talk about how to mosaic (inundation_nation?) as well.

Also.. we need more clear information about ESIP credentials and not send them to the "Configure AWS CLI tools" link. We can tell them specifically what is required (if anything). Update: might need instructions to be added on how they can create their own AWS account, IAM and creds, then their own "aws configure" (maybe add a profile name for safety reasons (conflict if they are using other AWS accounts somewhere). We have to be careful to talk about aws profile so we don't accidently blow away their ability to use other AWS sites / s3 buckets currently or later.

And.

Tell them about the aws s3 cp '--dryrun' flag to help validate the target location before doing the actual download (start with the --dryrun flag, hit CTRL-c to stop it, look to ensure the target pathing is good, take off the '--dryrun' flag and run again to actualy download

Maybe we need three patterns:

1) Just look at some recent outputs for a huc(s)

2) run a small sample processing for a single predefined (12090301) using the small 'sample inputs" folder, then show them to mount to "sample_inputs" instead of "inputs". (careful as we don't mount against 'inputs" but data.

3) Load the full inputs dir to process any hucs as you want.

Need notes about Brad's new tool for inundation, named tools/inundate_mosaic_wrapper.py (temp in https://github.com/NOAA-OWP/inundation-mapping/blob/dev-add-inundate-mosaic-wrapper/tools/inundate_mosaic_wrapper.py)

RobHanna-NOAA commented 1 year ago

Add Watchers @BradfordBates-NOAA

CarsonPruitt-NOAA commented 1 year ago

I created a directory on our on-prem machine called sample_inputs with the bare minimum (744MB) needed for a pipeline run on 12090301. There's a README explaining that the data is incomplete and users can only use -u 12090301 as a HUC arg. Please take a look at the README file and do another test to make sure it runs successfully.

CarsonPruitt-NOAA commented 1 year ago

FYI, I was able to run using this directory by simply adding another volume mount to my docker container like so -v /home/user/sample_inputs/:/data/inputs. I don't know whether that's easier than renaming the directory to /inputs, but it's another option.

RobHanna-NOAA commented 1 year ago

oh ya.. that seems easier. I forgot about that one.

RobHanna-NOAA commented 1 year ago

sample_inputs uploaded to ESIP

RobHanna-NOAA commented 1 year ago

Another update: We have a small correction to make on one of the URLs. is: aws s3 cp --recursive s3://noaa-nws-owp-fim/hand_fim/outputs/fim_4_3_11_0/12090301 \ /your_local_folder_name/12090301 --request-payer requester

sb: aws s3 cp --recursive s3://noaa-nws-owp-fim/hand_fim/fim_4_3_11_0/12090301 \ /your_local_folder_name/12090301 --request-payer requester (remove the /outputs)

RobHanna-NOAA commented 1 year ago

Yes other update: Annie at ESIP changed the permissions to our bucket so absolutely no credentials are required anymore. However.. the syntax changes a little (notice the ending arguments). aws s3 ls s3://noaa-nws-owp-fim/hand_fim/ --no-sign-request. And make sure we have that new tag on the end of the copy tags as well.

RobHanna-NOAA commented 1 year ago

Can I now recommend that we tell our users that they get all inputs from us. I think we can still provide links to the original sources, we want to make sure they know that these are raw and not pre-processed and may not be directly compatible with running against fim code.

RobHanna-NOAA commented 1 year ago

I think this is done. I finished it a month or more ago. I will double check it.

jcphill commented 1 week ago

FYI, the sample_inputs directory does not work because it does not contain the included_huc8_withAlaska.lst file needed since https://github.com/NOAA-OWP/inundation-mapping/blame/82b5e422da3e59c988899994ac97fcfbcb4f4b63/src/check_huc_inputs.py#L18 and there is no error if the glob matches no files.

RobHanna-NOAA commented 1 week ago

Yes. the sample inputs directory does not work as we have not been able to update it for a while. I will bring it up to my tech lead and see what we can do. Might take a while though but I will keep you posted.

mluck commented 1 week ago

There is some code to generate sample data in the dev-sample-data branch, but it hasn't been modified to work for Alaska. See step 1 in https://github.com/NOAA-OWP/inundation-mapping/pull/1178.

RobHanna-NOAA commented 1 week ago

Can I recommend we make a new issue card and get it assigned to someone. @mluck : Do you think you have time to look into it? If not, I can teach someone else how to do it. Thoughts?

NOAA-OWP / inundation-mapping

[13pt] Create sample inputs ESIP directory to work against two HUCS #936