Closed huard closed 4 years ago
@huard I have started a branch "ERA5_notebook", but I have a problem with the rain_snow_fraction and nc_spec properties for the linear transform. Is the RAVEN server running on twitcher (is that correct?) updated to reflect the latest changes? I'm kind of stuck here. Thanks
pavics.ouranos.ca runs the last release, not the master.
If you do
make start
make notebook
it will launch a jupyter notebook with environment variables set-up to connect to the local server you just started (instead of connecting by default to pavics.ouranos.ca).
For NRCAN: added test_NRCAN_daily.py to tests, code is working but assertion is not finding DIAG_NASH_SUTCLIFFE in the diagnostics output file. Looking at ERA5 it seems this is normal, but could you confirm please?
Does your input file contain an observed discharge series ? If not, then the diagnostic cannot be computed.
... Well this is awkward. Indeed, I don't. This raises the question though: Does the user have to ensure his/her Qobs data and meteo data have the exact same length? This will have to be addressed if it is the case as right now I'm basically just pulling full years of data from the OPeNDAP server on THREDDS.
Qobs and meteo definitely dont have to be the same length. The only restriction is that the meteo meed to cover the simulation period.
OK, but can the variables each have their own time vector? Or do they have to have the same time vector and then fill with NaN/FillValues/MissingData ?
Raven reads each time series (as long as not in a :multidata
block) individually and cuts out the simulation period. For Q_obs missing data are filled with nodata.
OK thanks!
If it helps, send me the 5 Raven files or point me to them. :)
OK I'll finish doing a few tests and I'll send you where I'm at. It's pretty obvious actually, we have the Qobs in the Salmon netcdf files already, I just need to extract it and merge it with the meteo data (I think)...
Nope. You don't have to. Things don't need to be in the same NetCDF file. They can be all in different files.
Yeah I understand that in RAVEN you don't need to have them all at the same place, but I think the way it is setup (and David can and will probably correct me here) is that there is a single file that contains all the data. Not sure we allow having multiple different sources of data to send to Raven?
All my original examples definitely had them in two different files- one for meteo and one for Qobs. And this was working.
OK well then That is why we (I) need more docs. I'll try and see how to do that then!
So I see that the Qobs are only in rvt format? Don't they need to be in .nc? I guess this aspect would be better left for @huard, he would know better than I.
wps_gr4j_cemaneige.py
looks like there is one filename for each input.
Both in the python wrapper and the WPS interface, you can pass a list of netCDF input files, and the code will automatically assign the correct file for each variable, as long as the variable name is understood. Now here if you passed both a NLWIS input file and the Salmon input file, there would be confusion because both files would include data for the same variables. I'm not sure what would happen, but an error should definitely be raised.
OK thanks, I'll take note of that and try to generate an independent Qobs and run the model that way.
Another option is to compute the NSE in python.
def nse(sim, obs):
"Nash-Sutcliffe Efficiency"
return 1 - ((sim - obs)**2).sum()/((obs - obs.mean())**2).sum()
@huard: Does this take care of missing values?
Hum, if sim and obs are pandas time series, I think it does. I'd have to check with xarray.DataArray though.
In the negative, you'd only have to take a dropna()
before the final sum.
I think we have several test cases where we calculate the objective with the Python function and compare it to the Raven output, don't we? I remember that I once had to fix something in Raven with the NSE calculations.
I don't think so. I think we're comparing the NSE Raven returns with the expected value you computed on your end.
ok.
I've just added 2 notebooks (ERA5 and NRCAN) and a pytest test (WPS and python wrapper, both NRCAN). The tests pass, that's good. The notebooks fail due to the rain/snow precipitation for ERA5 and NRCAN. The rain_snow_fraction flag causes an error, and when I remove it I get the default RAVEN error of snow autogeneration flag off. @huard, could you take a look please?
Notebooks in "ERA5_notebook" branch. Tests in "connectNRCAN" branch, should be ready to merge I think.
Also, here is a todo list to make everything integrate well:
[ ] Add capability and test to use custom catchment, including calling WPS to extract elevation, lat, long and area in the Notebooks
[ ] Add Qobs from independent source
[ ] Allow averaging data using average of data points within the basin contour instead of the nearest point as is done now
@richardarsenault Which data points do you mean? Meteorology? There is several options in Raven to derive basin aggregates when several meteorological forcings are given. See :Interpolation
in Raven. I think currently the options are:
INTERP_NEAREST_NEIGHBOR
INTERP_AVERAGE_ALL
INTERP_INVERSE_DISTANCE
INTERP_INVERSE_DISTANCE_ELEVATION
INTERP_FROM_FILE
(give weights for each time series in a file; like grid weights for gridded forcings)All are specified with :Interpolation
in the *.rvi
file
@richardarsenault There is a typo, should read
rain_snow_fraction='RAINSNOW_DINGMAN'
@huard thanks for catching that! @julemai Yeah that is one option although I guess we need to give the catchment boundaries to RAVEN so it can compute those things from a 2D boundingbox of data? Or do we have to replace points that are outside of the polygon to NaN before the calculations are performed?
@richardarsenault Raven does not use any shape files. You would need to preprocess this information and provide it to Raven in a :GaugeWeightTable
. The following is copied from the RAVEN manual (around page 110):
INTERP_FROM_FILE [filename]
- weights for each gauge at each HRU are specified in a file named filename
with the following contents:
:GaugeWeightTable
[NG] [# of HRUs]
{w_n1 w_n2 ... w_nNG} x {# of HRUs}
:EndGaugeWeightTable
where NG
is the number of gauges. The sum of the weights in each row (i.e., for each HRU) should be 1. It is assumed that the number of HRUs is the same as in the current model .rvh
file; the orders are also assumed to be consistent.
Julie: The above is beneficial if you have lots of files and setups that have the same weights. Processing of the forcings and creating a lumped meteorologic forcing might take lots of time. With the GaugeWeightTable
approach you don't need to change the forcing files and basically only derive the weights once (assuming that the number and location of gauges does not change over time).
I have progressed but need help now. @huard, I have made it so a user pushes a shapefile and the data is auto-extracted, subsetted and masked; and the basin properties (Elevation, area, centroid_lat and centroid_lon) are also auto-calculated. But I have 2 snags:
1- The masking function requires a .shp file, whereas our WPS for area/elevation/centroid calcs uses a zipped archive with the shp, shx, etc. Not sure how to best manage this. Ask user to push a zipped archive, unzip it for the masks? How to do this efficiently?
2- problem with "ERROR: CTimeSeries::Parse: ReadFromNetCDF: If there are multiple stations in NetCDF file, the StationIdx argument must be given (numbering starts with 1) to identify stations that should be read in" So basically my 2d netcdf file is missing something, not sure what that is. I know you've worked with the 2d already, any thoughts on how to fix this?
See branch ERA5_notebook for notebook "CompleteChainNRCAN.ipynb" for my current status, I need a hand here but after that I think we'll be close!
I am not entirely sure but maybe you need a 3D gridded forcing block to load these data. A block would look similar to this:
:GriddedForcing MIN_TEMP
:ForcingType TEMP_DAILY_MIN
:FileNameNC data_obs/York_daily_3d.nc
:VarNameNC temp_min
:DimNamesNC nlon nlat ntime
:RedirectToFile GriddedForcings_3D.txt
:EndGriddedForcing
:GriddedForcing MAX_TEMP
:ForcingType TEMP_DAILY_MAX
:FileNameNC data_obs/York_daily_3d.nc
:VarNameNC temp_max
:DimNamesNC nlon nlat ntime
:RedirectToFile GriddedForcings_3D.txt
:EndGriddedForcing
But this would need a file that defines the weights for each grid cell and its contribution to each HRU/subbasin. A grid weights file looks similar to this:
:GridWeights
#
# [# HRUs]
:NumberHRUs 7
:NumberGridCells 24
#
# [HRU ID] [Cell #] [w_kl]
# where w_kl is fraction of forcing for HRU k is from grid cell l=(i,j)
# and grid cell index l is derived by l = (i-1) * NC + j
# where i and j are the row and column of cell l respectively and
# NC is the total number of columns.
# Following contraint must be satisfied:
# sum(w_kl, {l=1,NC*NR}) = 1.0 for all HRUs k
1 1 0.416533067501
1 15 0.107722095454
1 16 0.36656977053
1 17 0.0383775214918
1 19 0.00602985999575
1 21 0.0647676850277
2 0 0.102856848918
2 1 0.0334505627711
2 9 0.00180896483093
2 13 0.0577007064953
2 14 0.395953671258
2 16 0.118137602528
2 17 0.281295878077
2 18 0.00879576512124
3 1 0.274322207664
3 2 0.00130276858577
3 15 0.023101045798
3 17 0.0015817605414
3 18 0.215221591145
3 19 0.484470626266
4 19 1.0
5 6 0.160082270118
5 7 0.132468422768
5 10 0.220406663532
5 17 0.0586570788424
5 18 0.346061128261
5 19 0.0817086830172
5 20 0.000615753461159
6 4 0.244090672727
6 7 0.12726482638
6 10 0.485176949948
6 11 0.143467550945
7 3 0.131084722047
7 4 0.106582721427
7 5 0.0991456428469
7 6 0.00857427579251
7 7 0.193688495328
7 8 0.157489753798
7 11 0.268620187843
7 12 0.0348142009172
:EndGridWeights
Or you aggregate the gridded inputs in a pre-processing to a single, lumped forcing. Then you don't need a Gridded forcing block but you can directly read everything using a :Gauge
block similar to this:
:Gauge meteorological forcings
:Latitude 54.4848
:Longitude -123.3659
:Elevation 843.0
:Data RAINFALL mm/d
:ReadFromNetCDF
:FileNameNC data_obs/Salmon-River-Near-Prince-George_meteo_daily.nc
:VarNameNC rain
:DimNamesNC time
:EndReadFromNetCDF
:EndData
:Data SNOWFALL mm/d
:ReadFromNetCDF
:FileNameNC data_obs/Salmon-River-Near-Prince-George_meteo_daily.nc
:VarNameNC snow
:DimNamesNC time
:EndReadFromNetCDF
:EndData
:Data TEMP_MIN deg_C
:ReadFromNetCDF
:FileNameNC data_obs/Salmon-River-Near-Prince-George_meteo_daily.nc
:VarNameNC tmin
:DimNamesNC time
:EndReadFromNetCDF
:EndData
:Data TEMP_MAX deg_C
:ReadFromNetCDF
:FileNameNC data_obs/Salmon-River-Near-Prince-George_meteo_daily.nc
:VarNameNC tmax
:DimNamesNC time
:EndReadFromNetCDF
:EndData
:Data PET mm/d
:ReadFromNetCDF
:FileNameNC data_obs/Salmon-River-Near-Prince-George_meteo_daily.nc
:VarNameNC pet
:DimNamesNC time
:EndReadFromNetCDF
:EndData
:EndGauge
Not sure if this is helpful or even more confusing.
So for now I suggest we average over the grid offline, and open an issue to support weights.
OK, I can try and do that, unless you're working on it already? Also any ideas on the shapefile / zipped archive issue? Thanks!
from example_data import TESTDATA
import zipfile
import tempfile
# My shapefile
vec = TESTDATA["watershed_vector"]
# Open zip archive
z = zipfile.ZipFile(vec)
path = tempfile.mkdtemp()
z.extractall(path)
shdf=salem.read_shapefile(os.path.join(path,"LSJ_LL.shp"))
OK got that, thanks. Also got the averaging done. now fighting some other obscure bug, hopefully get it working tonight.
Note that we're working on server side aggregation of netCDF files: one link for all years and all variables. This should simplify these notebooks. This won't be ready for production in the next two weeks, but I wanted to let you know that this is going to get easier.
OK great, thanks, because they are taking time! But it works at least for now, we just have to keep the simulations short for the time being.
I only have 1 problem left with these notebooks, maybe you can help. I'm trying to send the zipped TESTDATA["watershed_vector"] (.zip containing .shp, .shx, etc.) to a WPS to extract area, elevation, centroid, etc to feed RAVEN. However I get a weird bug that I don't get when I call the service directly through Spyder. Can you test and see if the notebook works for you? I think it might be a permission issue?
I've pushed the latest version of CompleteChainNRCAN notebook in ERA5_notebook branch. I'm at a loss for this part.
Also I think the solution I have for extracting data and subsetting, if implemented on the server directly, should work pretty much out of the box? Can that be useful to you (plus expert optimization of course)?
One issue is that you're passing a PosixPath instance instead of a string. This is something we could support in birdy in the future, I'll open a ticket.
The other issue is I think a bug in Raven. We've never tested with zipped shapefiles. Looking into it.
What's the Raven error? I kinda lost track. Let me know if I need to fix something....
So no, this is something Trevor will have to look at. Should be simple though.
Description