Closed kdeloach closed 7 years ago
We may be able to reduce the overall footprint of the data by recursively walking the directories and enabling compression on all of the tifs. I believe the top level tif is already compressesd, but the rest didn't seem to be for region 2 data.
@dtarb has made a change in the data pre-processing to account for "holes" in the watersheds. This has resulted in new data, which is in a file called "Simple.zip" in the Google Drive folder referenced above.
Instructions from David:
To use this you will need to unzip this file retaining the folder structure putting each file in the corresponding folder of the data you already received. Then there is one line of code to change in RWD that changes the name of the files being used. I already pushed this into the branch I created in github.
The combined file size for this new set of data is about 575 GB. This is based on the sum of the file size column of the NHDPlus/RWDContUSA
folder in GDrive.
I have completed the download of NHD data and have begun verifying it against the latest RWD code changes.
I'm also downloading the NHD data to an ec2 instance. It will be quicker to process the data directly from AWS instead of uploading the processed files from our local network.
I downloaded the NHD files to my workstation and to an ec2 instance. I'm still working on merging this data with the DRB files and verifying that everything works with the latest code changes.
Splitting the remaining work up into separate issues.
Verify data works with new NHD codeCompress filesCreate AMI snapshotRef: https://github.com/WikiWatershed/rapid-watershed-delineation/pull/44#issuecomment-264650874