CanadianClimateDataPortal / Canadian-Climate-Data-Portal

3 stars 0 forks source link

Indices calculation PAVICS - Analyze data #8

Closed MarcAndreDubuc closed 5 years ago

MarcAndreDubuc commented 5 years ago
dbyrns commented 5 years ago

@MarcAndreDubuc I would like to know where to relation to * Ouranosinc/pavics-sdi#42 came? This issue is about the catalog component and will not be used for CCDP.

dbyrns commented 5 years ago

@davidcaron You start some test with Finch, but now we should test its reliability when a lot of requests are done at the same time. PyWPS is designed to run multiple jobs concurrently up to a predefined amount and then start to push the job into a db. Once the jobs end, the one in the db are suppose to start. Could you setup a test that flood Finch, making sure that the bd is hit and validate that all the processes complete successfully?

dbyrns commented 5 years ago

@davidcaron Has successfully test the queue system of twitcher (based on celery) and will check the one built-in PyWPS soon. If both are working, fine, but having only twitcher's celery queue is already enough to garantee that we are ready for launch on that topic.

dbyrns commented 5 years ago

@MarcAndreDubuc , @huard I'm still wondering if https://github.com/Ouranosinc/pavics-sdi/issues/42 is really a blocker for this issue.

@dbyrns It was created to show the connectivity between boards. Sorry for the confusion.

dbyrns commented 5 years ago

@tlogan2000 , @huard So we are able to complete this task, we would like to get a formal list, like the one in https://ccdpwiki.atlassian.net/wiki/spaces/CCDP/pages/57638919/Indices+prioritization+and+tracking containing indices that we want to be able to compute with a range of acceptable inputs (datasets, parameter). Maybe this list is already available somewhere?

huard commented 5 years ago

@dbyrns Is what you're asking similar to py.test fixture, looping on indices, input files and parameters ? This has been on our todo list for some time now.

dbyrns commented 5 years ago

We are no more in test mode, but rather in operationalization... What are the indices that we want to see on the portal. What are the datasets that the user will be able to choose from. What are the parameters that we can give control on and values that will not crash the process. If some parameters must be locked to a particular value we want to know it. In short we want to get what the user should see and from there give some specs on the UI design to launch processes.

huard commented 5 years ago

Got it. I'll let Travis answer for the data set, but I can provide the rest. Do you want it in json format ?

Indicator name: str
Indicator description: str
Parameters : {name : str, 
              type : [int, float, str],
              allowed_values : sequence or None, 
              range : (min, max)
              }
huard commented 5 years ago

Isn't it simpler to just use the DescribeCapabilities of the Finch server ?

davidcaron commented 5 years ago

@huard, currently, I'm using the data from the unit tests in xclim to send to finch. But the processing time is very small, so I'm basically looking for larger datasets, along with the indices that I compute on these datasets (500Mb, 2-3 Gb, 10Gb, 50Gb).

I'm having problems with the time format in the netcdf files. In the large netcdf files I found on boreas, the time dimension is either of the type cftime.Datetime360Day or cftime.DatetimeNoLeap. And I'm not sure which indices I can compute on these... the .resample() function doesn't work for these datasets I think.

The format of the input parameters doesn't matter, I just need the link to the netcdfs, the indices I can compute, for these files, and the other parameters (frequency, threshold...)

davidcaron commented 5 years ago

And if you know exactly which indices must be computed, and you can provide a dataset (large and small) with the parameters for each of these, that would be even better.

@tlogan2000 provided a short list of indices to be computed on the portal. We should choose 1 or 2 indices for the portal. We do not have yet a definitive answer to that.

Stable + Fast • Simple thresholds: txgt30, tnlt-25, tr_22 , degree days etc. • Simple max / min single day temperature or precip : tx_max, tn_min, rx1day etc. • Annual monthly sums or averages: tx_mean, pr Stable + slower + increased memory requirements (sometimes bivariate) • Heat wave frequency / max length (bivariate & complex calculation) • Consecutive days calculations – CDD, CWD • Could be ok for small domain / single grid cell ?

huard commented 5 years ago

@davidcaron Are you using xarray master ? Support for resampling non-standard calendars is still not yet released, but it's included in the xarray master.

As far as I understand, the indices that have been used up to now are those listed here: https://pavics.ouranos.ca/thredds/catalog/birdhouse/cccs_portal/indices/BCCAQv2/catalog.html

For freq and thresholds, since this is not for testing, just use the default values.

If you look in xclim at temperature.py and precip.py, you'll see indices inheriting from different classes (Pr, Tas, Tasmin, Tasmax). Just match the indices with the corresponding netCDF variable (pr, tas, tasmin, tasmax) and you should be good.

For the files you can test on, these are the main files Travis worked on so far: https://pavics.ouranos.ca/thredds/catalog/birdhouse/pcic/BCCAQv2/catalog.html

You could also test precipitation indices on those: https://pavics.ouranos.ca/thredds/catalog/birdhouse/cmip5/MRI/historical/day/atmos/r1i1p1/pr/catalog.html

In general, you can use the PAVICS dataset search interface to find what you're looking for. Just enter the variable you need (tas, pr, tasmin, tasmax), the frequency (day) and the project (e.g. CMIP5) and you'll find files you can test with. Let me know if something's unclear.

davidcaron commented 5 years ago

@davidcaron Are you using xarray master ? Support for resampling non-standard calendars is still not yet released, but it's included in the xarray master.

Thanks, that's good to know.

I think I have everything I need.

dbyrns commented 5 years ago

Thank you @huard and @MarcAndreDubuc . So we will use the indices provided by Marc-André and the default values as parameters like David suggested.

tlogan2000 commented 5 years ago

Thanks @huard was in QC city yesterday in meetings with the MFFP. For the xarray pypi : the updated resampling work we had done will hopefully be in the next release (0.12 I believe) for now we still have to pull from master as @huard said...

tlogan2000 commented 5 years ago

For which index I honestly don't know the answer... We are waiting on a decision from CCCS.
I have a gut feeling that it will likely be 'heat_wave_frequency' and/or 'heat_wave_max_length' linking to the health module but cannot promise. this is likely the most complicated (bivariate and realtively complex algorithm) so I would say if the tests work on those then we whould likely be in decent shape regardless of their choice?

tomLandry commented 5 years ago

I can confirm your hypothesis @tlogan2000. Indeed it was proposed by CCCS to use "heat" related indices that are useful to the sector module too. I will get more info on this, looking at my notes. As for a decision point, CRIM has authority to commit this knowing the recommendation of CCCS and partners.

davidcaron commented 5 years ago

https://finch.crim.ca is starting to get pretty stable under load now. I had to submit fixes to pywps for a couple things.

If you want to test it, (let me know if you encounter any bugs please) you can:

using birdy

point birdy at https://finch.crim.ca/wps

using the REST interface

use the post request described at https://finch.crim.ca/api (Execute POST /processes/{process_id}/jobs)

For example:

POST https://finch.crim.ca/providers/finch/processes/heat_wave_frequency/jobs BODY

{
  "inputs": [
    {
      "id": "tasmax",
            "href": "https://github.com/Ouranosinc/xclim/raw/master/tests/testdata/NRCANdaily/nrcan_canada_daily_tasmax_1990.nc"
    },
    {
      "id": "tasmin",
            "href": "https://github.com/Ouranosinc/xclim/raw/master/tests/testdata/NRCANdaily/nrcan_canada_daily_tasmin_1990.nc"
    },
        {
            "id": "freq",
            "data": "MS"
        },
        {
            "id": "thresh_tasmin",
            "data": "22.0"
        },
        {
            "id": "thresh_tasmax",
            "data": "30.0"
        },
        {
            "id": "window",
            "data": "3"
        }
  ],
  "response": "document",
  "mode": "auto",
  "outputs": [
    {
      "transmissionMode": "reference",
      "id": "output_netcdf"
    }
  ]
}
dbyrns commented 5 years ago

@tomLandry , @MarcAndreDubuc , the next step for this issue is to integrate that into the portal. I don't thing we need to add more pressure on H7 right now, maybe following your schedule we can close this issue and open a second one to "integrate Finch into the portal" having a lower priority.

tomLandry commented 5 years ago

As discussed offline, next step is a demo of Finch in the PAVICS context, plus introduction of portal designs.

tomLandry commented 5 years ago

@habitatseven, @MarcAndreDubuc and me had a short demo on Finch today. From discussions, I agree to close this issue as solved (except receive parameters from UI). Next step is continuing to prepare test scenarios using BCCAQv2 data from Boreas for 'heat_wave_frequency' and 'heat_wave_max_length, including subsets. We can then transmit the requests to Jordan so he takes a look at the API, and Jamie that can prepare designs according to the parameters.