Freshwater-Initiative / SkagitLandslideHazards

Seattle City Light is interested in improving understanding of landslide hazard and sediment transport to ensure reliable and cost-effective hydropower generation.
4 stars 4 forks source link

Workflows from DHSVM to Visualization #29

Open ChristinaB opened 4 years ago

ChristinaB commented 4 years ago

Updates:

  1. I made this HydroShare file so that the files in Nicoleta's code that are on her hard drive are all in one place online. https://www.hydroshare.org/resource/767e35f896a94023b25f788701bad641/ @NCristea I made you the first author on this resource and will leave you to the metadata and testing so we are ready to 'deliver' to SCL. I also rearranged a bit and but your files in the saturation code folder.
  2. @RondaStrauch @NCristea If you 'Open With' the CUAHSI JH it will transfer the files to compute so you can access it in your user space. I would download and reupload these two files SCL_pickling Hydro Notebook and the script that runs a function based of Nicoleta's steps to convert the DHSVM DTW to the Landlab grid resolution and size. SCL Pickly Hydro.py
ChristinaB commented 4 years ago

The Notebook view shows the imshow plot from the DHSVM ascii model output for the first two years. image

This is the view with array and mapped to UTM coordinates. image

It's not flipped! This is the Landlab input for SCL domain extent 30m grid. image

ChristinaB commented 4 years ago

Known issues:

ChristinaB commented 4 years ago

This code does work from xarray pre-netcdf export. The indexing does not work from the netcdf imported with the code we were trying to user earlier. However, this takes too long to run and we may need to recode with a loop.

counter=0
for j in range(len(x)):
      for k in range(len(y)):
            one_location=dsi.isel(x=[j],y=[k]).to_array()
            loc1list=np.array(one_location.variable)
            b = list(itertools.chain(*loc1list))
            c = list(itertools.chain(*b))
            d = list(itertools.chain(*c))
            HSD_dict_annualmaxDWT_hist[counter] = {keys[counter]:d} 
            counter=counter+1`
ChristinaB commented 4 years ago

@NCristea Could you please put the future netcdf files in a folder here? https://www.hydroshare.org/resource/767e35f896a94023b25f788701bad641/ Thank you for processing those. I will test them and hopefully it is smooth sailing from here.

On visualization: Do you have xrviz working on your desktop? Does it allow recording? then we don't need to code an animation, but record a video of a screen.

RondaStrauch commented 4 years ago

The numbers for the corners (lon, lat) I got from looking the GIS corners (very closely) for the input rasters that I converted to ASCII (starting from the Phi raster). The node id corners were done by math knowing the 30m grid rows and columns.

NCristea commented 4 years ago

The files are > 2GB in size, Hydroshare takes files <1 GB.

On Thu, Dec 5, 2019 at 10:09 AM Ronda Strauch notifications@github.com wrote:

The numbers for the corners (lon, lat) I got from looking the GIS corners (very closely) for the input rasters that I converted to ASCII (starting from the Phi raster). The node id corners were done by math knowing the 30m grid rows and columns.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/Freshwater-Initiative/SkagitLandslideHazards/issues/29?email_source=notifications&email_token=AD6YK2FWU7XEFCE6F777MH3QXE7K7A5CNFSM4JV2D6S2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEGBTCBI#issuecomment-562245893, or unsubscribe https://github.com/notifications/unsubscribe-auth/AD6YK2BSFOUSKXRYCQOJBHDQXE7K7ANCNFSM4JV2D6SQ .

--


Nicoleta Cristea Research Scientist University of Washington eScience Institute & Department of Civil and Environmental Engineering

RondaStrauch commented 4 years ago

Well, that would be a good reason! ugh, then we'll have to come up with another approach. we're trying a different compression.

ChristinaB commented 4 years ago

My experiment: The future files are 3.5 G uncompressed.
Download from Google Drive manually (Unzip manually the Windows .zip file)

From terminal:

gzip dtw_G_CNRM_CM5__rcp45.nc Result is 1.6G

tar -cvf dtw_G_CNRM_CM5rcp45.nc.gz.tar dtw_G_CNRM_CM5__rcp45.nc.gz dtw_G_CNRM_CM5rcp45.nc.gz Result is still 1.6G

New plan is to look at Ronda's idea to address significant digits. We only need 3. This will make a big difference.

ChristinaB commented 4 years ago

@NCristea Can we coordinate on running your code to manage dtype?

http://xarray.pydata.org/en/stable/io.html

These parameters can be fruitfully combined to compress discretized data on disk. For example, to save the variable foo with a precision of 0.1 in 16-bit integers while converting NaN to -9999, we would use

encoding={'foo': {'dtype': 'int16', 'scale_factor': 0.1, '_FillValue': -9999}}.

Compression and decompression with such discretization is extremely fast.

ChristinaB commented 4 years ago

encoding (dict, optional) – http://xarray.pydata.org/en/stable/generated/xarray.Dataset.to_netcdf.html

Nested dictionary with variable names as keys and dictionaries of variable specific encodings as values, e.g.,

{'my_variable': {'dtype': 'int16', 'scale_factor': 0.1, 'zlib': True}, ...}

To solve the problem sooner, the values.dtype is float64 when read in from Map file. I'll see about making the numpy arrays set to 3 sig digits only

RondaStrauch commented 4 years ago

This look like a way to limit to 3 decimal place precision.

value= str(round(value, 3))

ChristinaB commented 4 years ago

@NCristea - could you update your code so the first array that it is read in from the DHSVM Map file is then limited to 3 sig digits? This approach is slow but it works.

`values_sig3 = values

`for i in `range(values.shape[0]):

    for j in range (values.shape[1]):

        values_sig3[i,j]=round(values[i,j],3)

I don't see the future processing code on https://github.com/NCristea/SkagitLandslideHazards/ so I'm not sure what updates you have made since process_wt_grids_with_vis.py 3 days ago.

I am uploading one future file to cuahsi.jupyter.org where I am running my version of the code so I can test your future files. Do you expect them to be flipped? I don't know if there are other changes needed until we can use the files. 25% upload.....completing .....eventuallly

ChristinaB commented 4 years ago

@RondaStrauch I am still trying to execute SCL_pickly_hydro.py to save the wt outputs. Note at 30 minutes each, this will be a chore not to repeat. I have restarted the server a few times put my script gets 'killed'. I may try on another server... otherwise if the smaller digits make a difference, then that would impact this step as well.

ChristinaB commented 4 years ago

@RondaStrauch @NCristea The future data is not flipped. The remaining issues that need to be addressed at the netcdf building is having the date string converted to python readable date. Then I think we can do more with xarray to solve the dictionary size issue.
image

ChristinaB commented 4 years ago

I changed my script to start with rounded arrays, and last time I tried it still broke this server. I will ask Tony if we can get more compute access. It doesn't seem like rounding made it much smaller.

RondaStrauch commented 4 years ago

If we can't access the future data, than let's increase the historical by fraction.
At Newhalem, annual precipitation is projected to increase by 6% by 2050s. We could use that as a first cut.

NCristea commented 4 years ago

Hi Christina,

You can convert the strings to date using this as an example:

date_string = '01/29/2099-21'
date_object = datetime.strptime(date_string, '%m/%d/%Y-%H')
print(date_object)

You can loop through the record with:

dates_str = dsi.time.values record = len(dates_str) dates_date = [] for i in range(record): date_object = datetime.strptime(dates_str[i], '%m/%d/%Y-%H') dates_date.append(date_object)

I have uploaded a gif with an example animation of 10 images on google drive. The size is small, I think it could be used in the ePoster.

Nicoleta

On Fri, Dec 6, 2019 at 10:13 AM Ronda Strauch notifications@github.com wrote:

If we can't access the future data, than let's increase the historical by fraction. At Newhalem, annual precipitation is projected to increase by 6% by 2050s. We could use that as a first cut.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/Freshwater-Initiative/SkagitLandslideHazards/issues/29?email_source=notifications&email_token=AD6YK2GCF3QXLSQ7H22Y5HDQXKIUNA5CNFSM4JV2D6S2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEGE5CAA#issuecomment-562680064, or unsubscribe https://github.com/notifications/unsubscribe-auth/AD6YK2DVU2QGIORRLMVSHHTQXKIUNANCNFSM4JV2D6SQ .

--


Nicoleta Cristea Research Scientist University of Washington eScience Institute & Department of Civil and Environmental Engineering

ChristinaB commented 4 years ago

I tried this but it needs to be included in the entire workflow or it breaks the rest of the workflow.

ChristinaB commented 4 years ago

@RondaStrauch

Uniform

Synthetic Grid

For easy downloading - files on HydroShare Here is the netcdf input with synthetic grid- uniform distribution - landslide component 1202)

For viewing completed run Notebook - the same file is on Github Uniform Synthetic

SCL model grid

For easy downloading - files on HydroShare Here is the SCL model grid- uniform distribution - landslide component 1202)

For viewing completed run Notebook - the same file is on Github Uniform SCL grid

Lognormal

Synthetic grid

This is the lognormal distribution, synthetic grid working example with 1202 component

For viewing completed run Notebook - the same file is on Github Lognormal Synthetic

SCL model grid

For downloading - This is the lognormal distribution, SCL - may be working -need more time- 1202 component

For viewing completed run Notebook - the same file is on Github lognormal distribution, SCL - may be working -need more time- 1202 component

Data Driven

Synthetic model grid

To upload.

SCL model grid

For viewing completed run Notebook - this file is on Github data driven and array saving distribution, SCL 1202 component

For downloading - from HydroShare data driven, SCL 1202 component

ChristinaB commented 4 years ago

@RondaStrauch Could you take a look at the error in the SCL uniform notebook? This is all updated to run with fire and the netcdf input - -- but this error I think is related to the T input. Do you agree? image

ChristinaB commented 4 years ago

@RondaStrauch For the lognormal spatial notebook - I think it runs fine, but it takes a long time and I had to interrupt the testing. Could you restart it on your end?

RondaStrauch commented 4 years ago

Note on running 3 comments up: Couldn't run the synthetic uniform because missing synthetic.nc file

Added this to the landslide_probability_20191202.py at line 729 if self.groundwater__recharge_distribution is not None: The ran the SCL uniform and lognormal. Connection fails before completing.

ChristinaB commented 4 years ago

@RondaStrauch Did you update the 1202 file or do you want me to do that? You could put the most recent date on Github.

RondaStrauch commented 4 years ago

@ChristinaB the current .py is _20191206, so that is up on the hydroshare resource and need to be the current one in the first code block of any notebook. I'll copy it over to github.

RondaStrauch commented 4 years ago

@ChristinaB - Oops, look like the latest is _20191208! not 06

RondaStrauch commented 4 years ago

@ChristinaB - can you load the mean and standard deviation text files with names like below into the ASCII folder in the Slippery Future Data resource? "dtw_mean_hist.txt" "dtw_stndev_hist.txt" "dtw_mean_fut.txt" "dtw_stndev_fut.txt" Which can be read in as arrays like this: dtw_mean_h = np.loadtxt("dtw_mean_hist.txt") dtw_stndev_h = np.loadtxt("dtw_stndev_hist.txt") and then called in the lognormal-spatial call in the LandslideProbability()

RondaStrauch commented 4 years ago

@ChristinaB - I've also added a mask for the nodes around the Goodell Fire such that we can limit these as core nodes for calculation and maybe for plotting.

(grid1, fire_mask) = read_esri_ascii(data_folder+'/scl_firebox.txt') grid.add_field('node', 'fire_area', fire_mask) grid.set_nodata_nodes_to_closed(grid.at_node['fire_area'], -9999)

RondaStrauch commented 4 years ago

@ChristinaB - Added the fire area mask to the lognormal spatial notebook and added place holder for loading of dtw mean and standard deviation arrays that are in the ASCII folder. SCL_lognormal_spatial_landslide_20191209.ipynb By using the fire area mask to close nodes outside the fire Goodell creek fire, we should be able to cut down the processing time. Then we can show figures closer up to the fire to see changes in landsliding probability.
To dos: 1) generate dtw mean and stndev for historic and at least one future 2) add these as text files to the ASCII folder 3) check the notebook for correct naming 4) run the notebook 2 times (historical and future) 5) save figures we want to make pretty pictures for poster!

ChristinaB commented 4 years ago

@RondaStrauch Try to run from 20200331_map2netcdf2array_lognormal_spatial_Depth_SCL_LandlabLandslide.ipynb from https://hydroshare.org/resource/4cac25933f6448409cab97b293129b4f or click on link below and view https://hydroshare.org/resource/4cac25933f6448409cab97b293129b4f/data/contents/20200331_map2netcdf2array_lognormal_spatial_Depth_SCL_LandlabLandslide.ipynb