Extreme Values for Precipitation

juliettelavoie commented 2 years ago

Precipitation

The health checks I ran flagged extremely high daily precipitation (>300 mm d-1) in all models. It looks like they are always in the same spot in Yukon, though not always at the same time. It can go as high as 700 kg m-2 d-1. I suspect an outlier in the reference data ?

pr1 pr2 pr3

Temperatures

Moved to issue #32 as the origin of the problem is probably different, but the question on how to handle extreme values is the same for precipitation and temperature.

How should we handle these extreme values?

Should Ouranos replace the extremely high data by a NaN before calculating their indicators? Was it removed by PCIC before calculating their indicators ? If not, removing it in the Ouranos calculations would lead to non-uniform data for all indicators...

The code to reproduce the figure is in section Potential Issue 2 and of the QAQC-CMIP6-BCCAQv2.ipynb.

tlogan2000 commented 2 years ago

@juliettelavoie Thank you for the issue. I have tagged a few people to get their input on how we should deal with this

Personal opinion would be the mask these values as NaNs before calculation of our list of climate indicators but would like some feedback from colleagues as well

JeremyFyke commented 2 years ago

Before we NaN out would it be possible to try and understand why these values occur (e.g. QA of input data or downscaling process)? Just to rule out possibility that they are not indicative of a bigger issue that is skewing things elsewhere (albeit at levels that aren't visually apparent). Also see imminent post on this Issue by @laura-vanvliet who has just seen some similar things in some preliminary distribution-based evaluations of ClimDex data.

JeremyFyke commented 2 years ago

BTW thanks @juliettelavoie for this analysis :)

laura-vanvliet commented 2 years ago

I agree with Jeremy on digging a little deeper to find the root cause of the extremes. I've just started looking at distributions of ClimDex indices and noticed something similar. The extreme values are apparent in measures of skew and kurtosis of some annual indices, though not apparent in measures of mean/sd.

Skew and kurtosis of annual total precip (prcptotETCCDI) is plotted below. You can see in these maps that the "tailedness" of the distributions is very high in some of the same regions mentioned (i.e., southern Yukon) but also in other areas. I plan to calculate the same metrics on the inputs (tasmin, tasmax, pr) to help find where else we are seeing extreme values. ACCESS-CM2_prcptotETCCDI

juliettelavoie commented 2 years ago

Thanks for the plots @laura-vanvliet ! Let us know what you find in your digging and if you find the root of the problem. Do you have access to the input raw data and the bias correction code? (I don't...)

JeremyFyke commented 2 years ago

@ssobie have you or other folks at PCIC had seen these extreme values before? I'm wondering if they manifest in the input ANUSPLIN target dataset?

cpomer10 commented 2 years ago

Hi - @ssobie mentioned some of this in his original issue #11 - looks likely that most of these issues stem from the target dataset and were noticed in the CMIP5 iteration. Still probably worth poking around to confirm.

juliettelavoie commented 2 years ago

Sorry, I missed that! Thank you for pointing us to that issue. If this was a known issue, do I understand that PCIC chose not to put NaNs on extreme values on purpose? and Ouranos should do the same?

juliettelavoie commented 2 years ago

After internal discussions at Ouranos, our plan is to put NaNs on pr>300 mm d-1 and tasmax>60 degC before doing the indicator calculations. Any objections?

cpomer10 commented 2 years ago

Hi @juliettelavoie! Do you know if this was done for the CMIP5 indicator calculations? @ssobie was anything like this done for the Climdex calcs?

juliettelavoie commented 2 years ago

It was not done by Ouranos for CMIP5. But, we think it is better to have more accurate results now by not using problematic data, even if it is inconsistent with the previous methods. To be clear, there would not be any NaNs on the website directly. xclim has a missing functionality to handle the NaNs. This functionality didn't exist at the time of the CMIP5 calculations.

laura-vanvliet commented 2 years ago

@juliettelavoie, how would the missing functionality handle NaNs, through interpolation? For each model and SSP I calculated the skew and kurtosis of a 10-year slice (2070-2079) for BCCAQ precip (same as for temperature). The average absolute maximum over all models/SSPs is shown in figures below. There are a number of areas with outlier precip events (as you can see on the map, kurtosis > 3000). The regions noted by @ssobie in issue #11 are visible, but also many others. I'm not sure if this is concerning, or just the nature of precip modelling and the target dataset? Either way, it indicates that there are large precip event that may be below the 300 mm threshold, but outliers nonetheless. This should probably be a consideration if we decide to NaN out outliers. I will calculate the same metrics on the target dataset to get a better idea of where this comes from. all_mod_abs_max_sk_kurt_pr

juliettelavoie commented 2 years ago

The missing function doesn't do interpolation. At a given point, if there is less than a given percentage of nans, it ignores the nan in the calculations. Over that percentage, it outputs a nan. With the thresholds I proposed, the percentage of nan is always really small and would just be skipped.

I don't know about using a kurtosis threshold to determine if we put a nan or not... What threshold should we use? Is a high value really impossible? I don't know enough to answer your question about it concerning or just being the nature of precip modelling and the target dataset... @tlogan2000 do you know? Also, do you have an idea of how many nans we would have to put with this method?

laura-vanvliet commented 2 years ago

Hmm. I'm not trying to suggest that we go ahead with a method like that yet. But I think the outlier problem warrants some more investigation before we go ahead with a static upper threshold. Also, ideally, I think outliers should still be considered a threshold exceedance for some indices?

juliettelavoie commented 2 years ago

It is true that using just putting nans could bring its own issue. eg. 700 mm d-1 would be a nan and not be counted in r1mm ( number of wet days), but maybe it should...

ssobie commented 2 years ago

Thanks for these plots @laura-vanvliet they are a nice check. The very high precipitation values in the downscaled data generally results from similar values in the ANUSPLINv1 target data. The Yukon and Quebec regions appear to be questionable. The Nunavut region that shows up with very high skewness and kurtosis comes from an anomalously high precipitation event at the Kugluktuk station that does appear to be real. It is in both the AHCCDv1 and AHCCD adjusted station data. It's a case where I don't think it can be removed without some verification that the original station data value is actually incorrect.

KUGLUKTUK_NU_2300902_site_pr_timeseries

ssobie commented 2 years ago

Hi @juliettelavoie! Do you know if this was done for the CMIP5 indicator calculations? @ssobie was anything like this done for the Climdex calcs?

@cpomer10 @JeremyFyke We didn't exclude these values before calculating the derived variables, but did note the regions as needing more caution for any applications.

ssobie commented 2 years ago

Further checking the Kugluktuk precipitation event and found mention of it in "Canada's Top Ten Weather Stories for 2007":

A Northern Gully Washer Around July 21, an intense rainstorm occurred in the community of Kugluktuk (formerly Coppermine), Nunavut. The incredible two-day rainfall totaled 178.2 mm: 55.2 mm on July 20 and 118.3 mm on July 21. An analysis of extreme rainfall intensities concluded that this was an impressive 500-year event. Engineers designing water management systems and infrastructure were in a quandary as to deciding what the new storm for planning drainage systems in the future should be.

tlogan2000 commented 2 years ago

@juliettelavoie I'm kind of leaning towards simply leaving the precip as is (i.e. no masking) for now. It will not really affect our wetday calculations either way and the discussion warrants a bit more digging than what we have time for right now.

Opinions from the group?

For everyone's info :

I did some digging into PMP (probable maximum precipitation) values. This paper in particular gives some insight to simulated 6h PMP values but also observed 24h for a few stations (table 2) https://journals.ametsoc.org/view/journals/hydr/20/10/jhm-d-18-0233_1.xml#tbl2
Canadian locations (nb* quick visual filtering by latitude only) would seem to indicate that at least one location would have a estimated 24hr PMP values of >430mm (auburne drainage)..

Ouranos has a paper on this focusing on spring pmp (note ; prairies and eastern Canada only) median changes show up to +20% increases for certain regions. Being a little generous for the median climate change signal would see precip increases by something around 20-30% and even with that we rapidly get to a 550mm threshold so maybe individual simulations with ~700mm isn't completely out the question?

ECCC-CCCS / CMIP6-CanDCS-Quality-Control