Open RobHanna-NOAA opened 1 year ago
Add Watchers @BradfordBates-NOAA
I created a directory on our on-prem machine called sample_inputs
with the bare minimum (744MB) needed for a pipeline run on 12090301. There's a README explaining that the data is incomplete and users can only use -u 12090301
as a HUC arg. Please take a look at the README file and do another test to make sure it runs successfully.
FYI, I was able to run using this directory by simply adding another volume mount to my docker container like so -v /home/user/sample_inputs/:/data/inputs
. I don't know whether that's easier than renaming the directory to /inputs
, but it's another option.
oh ya.. that seems easier. I forgot about that one.
sample_inputs uploaded to ESIP
Another update: We have a small correction to make on one of the URLs. is: aws s3 cp --recursive s3://noaa-nws-owp-fim/hand_fim/outputs/fim_4_3_11_0/12090301 \ /your_local_folder_name/12090301 --request-payer requester
sb: aws s3 cp --recursive s3://noaa-nws-owp-fim/hand_fim/fim_4_3_11_0/12090301 \ /your_local_folder_name/12090301 --request-payer requester (remove the /outputs)
Yes other update: Annie at ESIP changed the permissions to our bucket so absolutely no credentials are required anymore. However.. the syntax changes a little (notice the ending arguments). aws s3 ls s3://noaa-nws-owp-fim/hand_fim/ --no-sign-request
. And make sure we have that new tag on the end of the copy tags as well.
Can I now recommend that we tell our users that they get all inputs from us. I think we can still provide links to the original sources, we want to make sure they know that these are raw and not pre-processed and may not be directly compatible with running against fim code.
I think this is done. I finished it a month or more ago. I will double check it.
FYI, the sample_inputs directory does not work because it does not contain the included_huc8_withAlaska.lst file needed since https://github.com/NOAA-OWP/inundation-mapping/blame/82b5e422da3e59c988899994ac97fcfbcb4f4b63/src/check_huc_inputs.py#L18 and there is no error if the glob matches no files.
Yes. the sample inputs directory does not work as we have not been able to update it for a while. I will bring it up to my tech lead and see what we can do. Might take a while though but I will keep you posted.
There is some code to generate sample data in the dev-sample-data
branch, but it hasn't been modified to work for Alaska. See step 1 in https://github.com/NOAA-OWP/inundation-mapping/pull/1178.
Can I recommend we make a new issue card and get it assigned to someone. @mluck : Do you think you have time to look into it? If not, I can teach someone else how to do it. Thoughts?
in ESIP, we have a full inputs directory copy. However, we need a smaller "sample" version of the inputs folder as well which will give a user just the files they need to process a couple pre-defined HUCs in a demo mode. The entire folder structure of the new sample input directory will need to be identical so no code changes are required.
We need to review the code and isolate which files are needed for those two HUCs. The current readme.md does not appear to define which two HUCs but might be in other wiki pages. The readme.md will also need to be updates to reflect the two options of a full download or the smaller sample download.
During some testing of this concept, we found we can keep just a couple of HUC6 dem files in the inputs/3dep_dems/10m_5070, versus keeping all of the DEMs. By keeping just a couple of HUC6 dem files, we got the total input foldre down from 400gb to 50gb. The sample input folder needs to keep most of the other folders. Out code looks for a specific vrt, fim_seamless_3dep_dem_10m_5070.vrt, for the DEMs. The default VRT will fail if not edited as it is looking for the dem defined in each "ComplexSource" nodes. There is one "ComplexNode" per DEM. Just delete all of the "ComplexSource" nodes in that file but keep on the nodes that match the selected HUC6 (for the selected HUC8) and it works fine.
Suggested overall output: Readme
inputs
folder to work with the full CONUS domain (and hawaii, PR) 2) download the ESIPsample_inputs
folder to work with HUC8 x and y (we will pre-define those hucs. Maybe 05030104 (the smallest) and maybe our fav 12090301. When they download the sample inputs, make sure it is in their "data" directory and is renamed to "inputs". Yes. yuk. but our code mandiates the pathing ofConsidering volume stuff.. we might want to consider pulling some of the README.md and into INSTALL.md (or maybe a new .md). Then have README.md reference the other .md files as needed. The README.md is getting pretty big and will grow again with these notes.
Also.. we might want to add notes in the md to talk about inundation.py and mosiac_inunation.py as an external user did ask about it recently.
Update:
Also talk about how to mosaic (inundation_nation?) as well.
Also.. we need more clear information about ESIP credentials and not send them to the "Configure AWS CLI tools" link. We can tell them specifically what is required (if anything). Update: might need instructions to be added on how they can create their own AWS account, IAM and creds, then their own "aws configure" (maybe add a profile name for safety reasons (conflict if they are using other AWS accounts somewhere). We have to be careful to talk about aws profile so we don't accidently blow away their ability to use other AWS sites / s3 buckets currently or later.
And.
Tell them about the aws s3 cp '--dryrun' flag to help validate the target location before doing the actual download (start with the --dryrun flag, hit CTRL-c to stop it, look to ensure the target pathing is good, take off the '--dryrun' flag and run again to actualy download
Maybe we need three patterns:
1) Just look at some recent outputs for a huc(s)
2) run a small sample processing for a single predefined (12090301) using the small 'sample inputs" folder, then show them to mount to "sample_inputs" instead of "inputs". (careful as we don't mount against 'inputs" but data.
3) Load the full inputs dir to process any hucs as you want.
Need notes about Brad's new tool for inundation, named tools/inundate_mosaic_wrapper.py (temp in https://github.com/NOAA-OWP/inundation-mapping/blob/dev-add-inundate-mosaic-wrapper/tools/inundate_mosaic_wrapper.py)