Update notebooks - Githubissues

huard commented 4 years ago

Description

[ ] Update notebooks
[ ] Add notebook using ERA5.
[ ] Add notebook using NLWIS

richardarsenault commented 4 years ago

@huard I have started a branch "ERA5_notebook", but I have a problem with the rain_snow_fraction and nc_spec properties for the linear transform. Is the RAVEN server running on twitcher (is that correct?) updated to reflect the latest changes? I'm kind of stuck here. Thanks

huard commented 4 years ago

pavics.ouranos.ca runs the last release, not the master.

If you do

make start
make notebook

it will launch a jupyter notebook with environment variables set-up to connect to the local server you just started (instead of connecting by default to pavics.ouranos.ca).

richardarsenault commented 4 years ago

For NRCAN: added test_NRCAN_daily.py to tests, code is working but assertion is not finding DIAG_NASH_SUTCLIFFE in the diagnostics output file. Looking at ERA5 it seems this is normal, but could you confirm please?

huard commented 4 years ago

Does your input file contain an observed discharge series ? If not, then the diagnostic cannot be computed.

richardarsenault commented 4 years ago

... Well this is awkward. Indeed, I don't. This raises the question though: Does the user have to ensure his/her Qobs data and meteo data have the exact same length? This will have to be addressed if it is the case as right now I'm basically just pulling full years of data from the OPeNDAP server on THREDDS.

julemai commented 4 years ago

Qobs and meteo definitely dont have to be the same length. The only restriction is that the meteo meed to cover the simulation period.

richardarsenault commented 4 years ago

OK, but can the variables each have their own time vector? Or do they have to have the same time vector and then fill with NaN/FillValues/MissingData ?

julemai commented 4 years ago

Raven reads each time series (as long as not in a :multidata block) individually and cuts out the simulation period. For Q_obs missing data are filled with nodata.

richardarsenault commented 4 years ago

OK thanks!

julemai commented 4 years ago

If it helps, send me the 5 Raven files or point me to them. :)

richardarsenault commented 4 years ago

OK I'll finish doing a few tests and I'll send you where I'm at. It's pretty obvious actually, we have the Qobs in the Salmon netcdf files already, I just need to extract it and merge it with the meteo data (I think)...

julemai commented 4 years ago

Nope. You don't have to. Things don't need to be in the same NetCDF file. They can be all in different files.

richardarsenault commented 4 years ago

Yeah I understand that in RAVEN you don't need to have them all at the same place, but I think the way it is setup (and David can and will probably correct me here) is that there is a single file that contains all the data. Not sure we allow having multiple different sources of data to send to Raven?

julemai commented 4 years ago

All my original examples definitely had them in two different files- one for meteo and one for Qobs. And this was working.

richardarsenault commented 4 years ago

OK well then That is why we (I) need more docs. I'll try and see how to do that then!

richardarsenault commented 4 years ago

So I see that the Qobs are only in rvt format? Don't they need to be in .nc? I guess this aspect would be better left for @huard, he would know better than I.

julemai commented 4 years ago

wps_gr4j_cemaneige.py looks like there is one filename for each input.

huard commented 4 years ago

Both in the python wrapper and the WPS interface, you can pass a list of netCDF input files, and the code will automatically assign the correct file for each variable, as long as the variable name is understood. Now here if you passed both a NLWIS input file and the Salmon input file, there would be confusion because both files would include data for the same variables. I'm not sure what would happen, but an error should definitely be raised.

richardarsenault commented 4 years ago

OK thanks, I'll take note of that and try to generate an independent Qobs and run the model that way.

huard commented 4 years ago

Another option is to compute the NSE in python.

def nse(sim, obs):
    "Nash-Sutcliffe Efficiency"
    return 1 - ((sim - obs)**2).sum()/((obs - obs.mean())**2).sum()

julemai commented 4 years ago

@huard: Does this take care of missing values?

huard commented 4 years ago

Hum, if sim and obs are pandas time series, I think it does. I'd have to check with xarray.DataArray though.

huard commented 4 years ago

In the negative, you'd only have to take a dropna() before the final sum.

julemai commented 4 years ago

I think we have several test cases where we calculate the objective with the Python function and compare it to the Raven output, don't we? I remember that I once had to fix something in Raven with the NSE calculations.

huard commented 4 years ago

I don't think so. I think we're comparing the NSE Raven returns with the expected value you computed on your end.

julemai commented 4 years ago

ok.

richardarsenault commented 4 years ago

I've just added 2 notebooks (ERA5 and NRCAN) and a pytest test (WPS and python wrapper, both NRCAN). The tests pass, that's good. The notebooks fail due to the rain/snow precipitation for ERA5 and NRCAN. The rain_snow_fraction flag causes an error, and when I remove it I get the default RAVEN error of snow autogeneration flag off. @huard, could you take a look please?

Notebooks in "ERA5_notebook" branch. Tests in "connectNRCAN" branch, should be ready to merge I think.

richardarsenault commented 4 years ago

Also, here is a todo list to make everything integrate well:

[ ] Add capability and test to use custom catchment, including calling WPS to extract elevation, lat, long and area in the Notebooks
[ ] Add Qobs from independent source
[ ] Allow averaging data using average of data points within the basin contour instead of the nearest point as is done now

julemai commented 4 years ago

@richardarsenault Which data points do you mean? Meteorology? There is several options in Raven to derive basin aggregates when several meteorological forcings are given. See :Interpolation in Raven. I think currently the options are:

INTERP_NEAREST_NEIGHBOR
INTERP_AVERAGE_ALL
INTERP_INVERSE_DISTANCE
INTERP_INVERSE_DISTANCE_ELEVATION
INTERP_FROM_FILE (give weights for each time series in a file; like grid weights for gridded forcings)

All are specified with :Interpolation in the *.rvi file

huard commented 4 years ago

@richardarsenault There is a typo, should read

rain_snow_fraction='RAINSNOW_DINGMAN'

richardarsenault commented 4 years ago

@huard thanks for catching that! @julemai Yeah that is one option although I guess we need to give the catchment boundaries to RAVEN so it can compute those things from a 2D boundingbox of data? Or do we have to replace points that are outside of the polygon to NaN before the calculations are performed?

julemai commented 4 years ago

@richardarsenault Raven does not use any shape files. You would need to preprocess this information and provide it to Raven in a :GaugeWeightTable. The following is copied from the RAVEN manual (around page 110):

INTERP_FROM_FILE [filename] - weights for each gauge at each HRU are specified in a file named filename with the following contents:

:GaugeWeightTable
  [NG] [# of HRUs]
  {w_n1 w_n2 ... w_nNG} x {# of HRUs}
:EndGaugeWeightTable

where NG is the number of gauges. The sum of the weights in each row (i.e., for each HRU) should be 1. It is assumed that the number of HRUs is the same as in the current model .rvh file; the orders are also assumed to be consistent.

Julie: The above is beneficial if you have lots of files and setups that have the same weights. Processing of the forcings and creating a lumped meteorologic forcing might take lots of time. With the GaugeWeightTable approach you don't need to change the forcing files and basically only derive the weights once (assuming that the number and location of gauges does not change over time).

richardarsenault commented 4 years ago

I have progressed but need help now. @huard, I have made it so a user pushes a shapefile and the data is auto-extracted, subsetted and masked; and the basin properties (Elevation, area, centroid_lat and centroid_lon) are also auto-calculated. But I have 2 snags:

1- The masking function requires a .shp file, whereas our WPS for area/elevation/centroid calcs uses a zipped archive with the shp, shx, etc. Not sure how to best manage this. Ask user to push a zipped archive, unzip it for the masks? How to do this efficiently?

2- problem with "ERROR: CTimeSeries::Parse: ReadFromNetCDF: If there are multiple stations in NetCDF file, the StationIdx argument must be given (numbering starts with 1) to identify stations that should be read in" So basically my 2d netcdf file is missing something, not sure what that is. I know you've worked with the 2d already, any thoughts on how to fix this?

See branch ERA5_notebook for notebook "CompleteChainNRCAN.ipynb" for my current status, I need a hand here but after that I think we'll be close!

julemai commented 4 years ago

I am not entirely sure but maybe you need a 3D gridded forcing block to load these data. A block would look similar to this:

:GriddedForcing MIN_TEMP
  :ForcingType TEMP_DAILY_MIN
  :FileNameNC  data_obs/York_daily_3d.nc
  :VarNameNC   temp_min
  :DimNamesNC  nlon nlat ntime
  :RedirectToFile GriddedForcings_3D.txt
:EndGriddedForcing

:GriddedForcing MAX_TEMP
  :ForcingType TEMP_DAILY_MAX
  :FileNameNC  data_obs/York_daily_3d.nc
  :VarNameNC   temp_max
  :DimNamesNC  nlon nlat ntime
  :RedirectToFile GriddedForcings_3D.txt
:EndGriddedForcing

But this would need a file that defines the weights for each grid cell and its contribution to each HRU/subbasin. A grid weights file looks similar to this:

:GridWeights                                                                   
   #                                                                           
   # [# HRUs]                                                                  
   :NumberHRUs       7                                                  
   :NumberGridCells  24                                                 
   #                                                                           
   # [HRU ID] [Cell #] [w_kl]                                                  
   #     where w_kl is fraction of forcing for HRU k is from grid cell l=(i,j) 
   #     and grid cell index l is derived by l = (i-1) * NC + j                
   #     where i and j are the row and column of cell l respectively and       
   #     NC is the total number of columns.                                    
   #     Following contraint must be satisfied:                                
   #         sum(w_kl, {l=1,NC*NR}) = 1.0 for all HRUs k                       
   1     1    0.416533067501                           
   1    15    0.107722095454                           
   1    16    0.36656977053                           
   1    17    0.0383775214918                           
   1    19    0.00602985999575                           
   1    21    0.0647676850277                           
   2     0    0.102856848918                           
   2     1    0.0334505627711                           
   2     9    0.00180896483093                           
   2    13    0.0577007064953                           
   2    14    0.395953671258                           
   2    16    0.118137602528                           
   2    17    0.281295878077                           
   2    18    0.00879576512124                           
   3     1    0.274322207664                           
   3     2    0.00130276858577                           
   3    15    0.023101045798                           
   3    17    0.0015817605414                           
   3    18    0.215221591145                           
   3    19    0.484470626266                           
   4    19    1.0                           
   5     6    0.160082270118                           
   5     7    0.132468422768                           
   5    10    0.220406663532                           
   5    17    0.0586570788424                           
   5    18    0.346061128261                           
   5    19    0.0817086830172                           
   5    20    0.000615753461159                           
   6     4    0.244090672727                           
   6     7    0.12726482638                           
   6    10    0.485176949948                           
   6    11    0.143467550945                           
   7     3    0.131084722047                           
   7     4    0.106582721427                           
   7     5    0.0991456428469                           
   7     6    0.00857427579251                           
   7     7    0.193688495328                           
   7     8    0.157489753798                           
   7    11    0.268620187843                           
   7    12    0.0348142009172                           
:EndGridWeights

Or you aggregate the gridded inputs in a pre-processing to a single, lumped forcing. Then you don't need a Gridded forcing block but you can directly read everything using a :Gauge block similar to this:

   :Gauge meteorological forcings
      :Latitude    54.4848
      :Longitude -123.3659
      :Elevation  843.0
      :Data RAINFALL mm/d
         :ReadFromNetCDF
            :FileNameNC     data_obs/Salmon-River-Near-Prince-George_meteo_daily.nc
            :VarNameNC      rain
            :DimNamesNC     time
         :EndReadFromNetCDF
      :EndData
      :Data SNOWFALL mm/d
         :ReadFromNetCDF
            :FileNameNC     data_obs/Salmon-River-Near-Prince-George_meteo_daily.nc
            :VarNameNC      snow
            :DimNamesNC     time
         :EndReadFromNetCDF
      :EndData
      :Data TEMP_MIN deg_C
         :ReadFromNetCDF
            :FileNameNC     data_obs/Salmon-River-Near-Prince-George_meteo_daily.nc
            :VarNameNC      tmin
            :DimNamesNC     time
         :EndReadFromNetCDF
      :EndData
      :Data TEMP_MAX deg_C
         :ReadFromNetCDF
            :FileNameNC     data_obs/Salmon-River-Near-Prince-George_meteo_daily.nc
            :VarNameNC      tmax
            :DimNamesNC     time
         :EndReadFromNetCDF
      :EndData
      :Data PET mm/d
         :ReadFromNetCDF
            :FileNameNC     data_obs/Salmon-River-Near-Prince-George_meteo_daily.nc
            :VarNameNC      pet
            :DimNamesNC     time
         :EndReadFromNetCDF
      :EndData
   :EndGauge

Not sure if this is helpful or even more confusing.

huard commented 4 years ago

So for now I suggest we average over the grid offline, and open an issue to support weights.

richardarsenault commented 4 years ago

OK, I can try and do that, unless you're working on it already? Also any ideas on the shapefile / zipped archive issue? Thanks!

huard commented 4 years ago

from example_data import TESTDATA
import zipfile
import tempfile

# My shapefile
vec = TESTDATA["watershed_vector"]

# Open zip archive
z = zipfile.ZipFile(vec)
path = tempfile.mkdtemp()
z.extractall(path)

shdf=salem.read_shapefile(os.path.join(path,"LSJ_LL.shp"))

richardarsenault commented 4 years ago

OK got that, thanks. Also got the averaging done. now fighting some other obscure bug, hopefully get it working tonight.

huard commented 4 years ago

Note that we're working on server side aggregation of netCDF files: one link for all years and all variables. This should simplify these notebooks. This won't be ready for production in the next two weeks, but I wanted to let you know that this is going to get easier.

richardarsenault commented 4 years ago

OK great, thanks, because they are taking time! But it works at least for now, we just have to keep the simulations short for the time being.

I only have 1 problem left with these notebooks, maybe you can help. I'm trying to send the zipped TESTDATA["watershed_vector"] (.zip containing .shp, .shx, etc.) to a WPS to extract area, elevation, centroid, etc to feed RAVEN. However I get a weird bug that I don't get when I call the service directly through Spyder. Can you test and see if the notebook works for you? I think it might be a permission issue?

I've pushed the latest version of CompleteChainNRCAN notebook in ERA5_notebook branch. I'm at a loss for this part.

richardarsenault commented 4 years ago

Also I think the solution I have for extracting data and subsetting, if implemented on the server directly, should work pretty much out of the box? Can that be useful to you (plus expert optimization of course)?

huard commented 4 years ago

One issue is that you're passing a PosixPath instance instead of a string. This is something we could support in birdy in the future, I'll open a ticket.

The other issue is I think a bug in Raven. We've never tested with zipped shapefiles. Looking into it.

julemai commented 4 years ago

What's the Raven error? I kinda lost track. Let me know if I need to fix something....

huard commented 4 years ago

https://github.com/Ouranosinc/raven/issues/198

huard commented 4 years ago

So no, this is something Trevor will have to look at. Should be simple though.

Ouranosinc / raven

Update notebooks #196

Description