OpenCDSS / cdss-app-statecu-fortran-test

Automated tests for StateCU software.
0 stars 0 forks source link

StateCU model dataset inconsistencies complicate automated testing #2

Open smalers opened 3 years ago

smalers commented 3 years ago

Automated testing for full datasets is a general framework that relies on the dataset names and packaging. However, there are inconsistencies in the dataset packaging that need to be handled in processing. The following table summarizes the dataset information. Ideally these issues can be addressed over time so that all datasets and packaging are consistent. The page that I am using for downloads is:

https://cdss.colorado.gov/modeling-data/consumptive-use-statecu

Dataset Zip File Zip Top Folders StateCU Base Name Comments
Colorado cm2015_StateCU.zip StateCU cm2015 Lowercase base name.
Gunnison gm2015_StateCU.zip StateCU gm2015 Lowercase base name.
North Platte NP2018_StateCU_modified.zip NP2018_StateCU_modified/StateCU NP2018 Uppercase base name, zip file includes output.
South Platte SP2016_StateCU_modified.zip SP2016_StateCU_modified/StateCU SP2016 Uppercase base name, zip file includes output.
Rio Grande RG2012_StateCU.zip RG2012_StateCU/StateCU rg2012 Lowercase base name. Zip file is uppercase but base name is lowercase, zip file includes output..
San Juan sj2015_StateCU.zip StateCU sj2015 Lowercase base name.
White wm2015_StateCU.zip StateCU wm2015 Lowercase base name.
Yampa ym2015_StateCU.zip StateCU ym2015 Lowercase base name.

General comments on automating downloads are:

  1. Download links from LaserFiche are much slower than the DWR ftp site. Maybe consider putting all the datasets in a fast download location. Sometimes the downloads time out. Laserfiche seems to be down sometimes at night, perhaps due to regular maintenance such as backups?
  2. It would be good to standardize on upper/lower case for dataset filenames.
  3. It would be good to standardize on whether the zip file has a top-level folder or not. Using a top-level folder provides some protection against accidentally unzipping and clobbering StateCU, etc.
  4. A filename with modified in the name is generic. What if there are multiple modifications? Would it be modified2, etc.? Maybe put the release date of the dataset in the filename using YYYYMMDD or YYYY-MM-DD. I recognize that having the year and a release date might be confusing but the filename convention could be documented on the download page.
smalers commented 3 years ago

I have been able to get the testing framework in place to work with the installers even though they are slightly different. I updated the above table to indicate installers that include the model output. This is why NP, SP, and RG dataset installers are so much larger. Being consistent in that regard would also be good. Maybe with current cloud storage and network speeds it makes sense to distribute a dataset with all the output, or maybe one zip file without and one with output.

smalers commented 3 years ago

The 2015 datasets used LaserFiche URLs that were slow to download. The State has updated to provide the datasets on their FTP site so I updated the download URLs for those datasets and they are much faster. The TSTool command files to do the downloads are here:

https://github.com/OpenCDSS/cdss-app-statecu-fortran-test/tree/master/downloads