bcgov / wqbc

An R package for water quality thresholds and index calculation for British Columbia
http://bcgov.github.io/wqbc/
Apache License 2.0
20 stars 9 forks source link

confirm bootstrap confidence interval #29

Closed joethorley closed 9 years ago

joethorley commented 9 years ago

I've implemented bootstrap confidence intervals as described in Method 1 of El-Shaarawi (2010) which is now available in the data-raw folder. I have used the code to calculate the CIs for the come example data set which El-Shaarawi also uses. Although the upper 95% CI is the same the lower 95% CI is 80 whereas I get 87. I have implemented the method in various ways and always get the same result. I believe the difference occurs because El-Shaarawi implements the method in wide format which necessitates the inclusion of NAs for certain variables which then introduces the possibility losing variables (if only NAs are selected). I consider this possibility to be undesirable because 1) it can result in replicates for which WQI cannot be calculated i.e. NAs for all variables and 2) it breaks the assumed independence between the variables because the presence of a second variable with more frequent samples will change the sampling distribution of the original variable simply by virtue of inclusion of NAs. The long-format method also has the advantage of being quicker to implement since variables with no values outside the limits don't need to be resampled (this is not the case with the wide-format method). And it is also generalizable to multiple values for one variable on the same date.

In the case of no missing values then the implemented methods should give the same result as Method 1 of El-Shaarawi. Can you

ateucher commented 9 years ago

On it. Thanks Joe. By the way, I notice that calc_wqi rounds the CIs - makes it more difficult to assess equality

ateucher commented 9 years ago

Alrighty, I've pushed a script to the wqbc-test package. It's not beautiful, but it extracts the data from the CESI data that has all complete cases and runs the calc_wqi on them, calculating bootstrap CIs. There are four datasets in the CESI data that are complete, and on 3 out of four the results look good (taking into account rounding).

In one of the datasets, all paramaters passed all tests, so the WQI is 100... not great for testing (but both do come out the same).

In the one where the CIs don't match, it looks like it is still dropping some rows - I didn't investigate further, but I'll let you take it from here.

As for your reasoning for using the long format I agree. Especially point 2.

ateucher commented 9 years ago

@joethorley does this give you what you need?

joethorley commented 9 years ago

I've decided to wait until we have a completely unambiguous definition of how to calculate limits before spending any time on anything else in case there any major changes to our understanding

ateucher commented 9 years ago

Fair enough

joethorley commented 9 years ago

I'm keeping this on ice until we resolve the following issue: https://github.com/poissonconsulting/wqdata/issues/7

ateucher commented 9 years ago

@joethorley how would you like to proceed on this? Let me know what I can do.

joethorley commented 9 years ago

I'm thinking of providing an option for the current method or Method 1 of El-Shaarawi (2010). When Method 1 is used it should match with cesi...

joethorley commented 9 years ago

@ateucher - the R code you provided at the start of the project - is this how the bootstrap CIs were calculated for the cesi examples?

ateucher commented 9 years ago

Yes.

joethorley commented 9 years ago

Here is the current situation:

I've reimplemented the CIs for Method 1 (calc_wqi(...,ci = "column")) and Method 2 (calc_wqi(...,ci = "column")) of the El-Shaarawi report (I've assumed that if a variable has all missing values then it is dropped - this is not explicitly stated in the report but seems to be what everyone does).

To cover all bases I've also temporarily pulled the CI calculating code for Method 1 and Method 2 that you provided exactly as it is - it can be switched on with calc_wqi(...,cesi_code = TRUE).

When El-Shaarawi applied Method 1 to the ccme dataset the results were:

Lower  Upper
79.6     94.2

I then ran all variants on the ccme example dataset - the results are as follows:

>   calc_wqis(ccme, ci = "column", cesi_code = FALSE)
     WQI Lower Upper Category Variables Tests F1  F2  F3
WQI 88.1  87.1  94.2     Good        10   103 20 3.9 2.8
>   calc_wqis(ccme, ci = "row", cesi_code = FALSE)
     WQI Lower Upper Category Variables Tests F1  F2  F3
WQI 88.1  87.2  94.2     Good        10   103 20 3.9 2.8
>   calc_wqis(ccme, ci = "column", cesi_code = TRUE)
     WQI  Lower Upper Category Variables Tests F1  F2  F3
WQI 88.1 88.093   100     Good        10   103 20 3.9 2.8
>   calc_wqis(ccme, ci = "row", cesi_code = TRUE)
     WQI  Lower Upper Category Variables Tests F1  F2  F3
WQI 88.1 88.078   100     Good        10   103 20 3.9 2.8

The results indicate that whether ci = "row" or ci = "column" makes little to no difference which is perhaps not surprising as the failures are not clustered by date.

However much more concerning is that all three methods give different results... I have implemented the wqbc bootstrap three different ways now and always arrive at the same result...

Finally I then ran all four implementations on the cesi test data set and none of the methods matched.

At this point I am unable to spend any more time trying to get to the bottom of what is happening and will have to switch off the calculation of CIs unless someone can provide an absolutely unambiguous definition of how to calculate CIs for ccme in conjunction with the correct lower and upper 95% credible intervals...

ateucher commented 9 years ago

Yeah, thanks @joethorley . We'll look at the cesi stuff a little closer, but I think you're right to stop now.

ateucher commented 9 years ago

I could have sworn I tried this before, but I ran the ccme dataset through the CCME calculator and got the following:

capture

Which is identical to calc_wqis(ccme, cesi_code = FALSE), but different from that reported in El-Shaarawi 2010) and different from the calc_wqis(ccme, cesi_code = TRUE)! It varies a bit depending on what replacement value you choose for values below the detection limit (zero, 0.5 * DL or DL).

ateucher commented 9 years ago

So I think there's something different about how you implemented the cesi code (Understandable, as I don't really understand the documentation - it mentions using a matrix of data:objective ratio, but I don't quite get what that really means). Regardless, I don't think we need to figure that out as what you've implemented is clearly right.

I guess the only remaining question is how does it compare to the results when bootstrapping on tidy data rather than wide? Doing it that way makes more intuitive sense to me. That said, I think we've landed at a good enough place for now, so please put this lower on the priority list, and we can come back to it if there's time. How does that sound to you?

ateucher commented 9 years ago

So, if it makes sense to you, please:

@StephHazlitt can you confirm this is what we decided?

stephhazlitt commented 9 years ago

Yep, awesome. If there is time, I would be curious to know if wqbc bootstrap on tidy data (i.e. Joe's code) gives the same result as the consistent ones above (88.1; 87.2-94.2)?

ateucher commented 9 years ago

Ok, since I couldn't let sleeping dogs lie, I figured out what's going on with the cesi code. wqbc:::get_excursions subtracts 1 from the ratio of value/limit, while the cesi ci code wants pure ratios. I put in a condition to add 1 to the excursion if cesi_code == TRUE, and it works :)

> calc_wqi(ccme, ci = "row", cesi_code = TRUE, messages = FALSE)
     WQI    Lower    Upper Category Variables Tests F1  F2  F3
WQI 88.1 87.31495 94.19495     Good        10   103 20 3.9 2.8
> calc_wqi(ccme, ci = "column", cesi_code = TRUE, messages = FALSE)
     WQI    Lower    Upper Category Variables Tests F1  F2  F3
WQI 88.1 87.26296 94.19849     Good        10   103 20 3.9 2.8
> calc_wqi(ccme, ci = "column", cesi_code = FALSE, messages = FALSE)
     WQI Lower Upper Category Variables Tests F1  F2  F3
WQI 88.1    87  94.2     Good        10   103 20 3.9 2.8
> calc_wqi(ccme, ci = "row", cesi_code = FALSE, messages = FALSE)
     WQI Lower Upper Category Variables Tests F1  F2  F3
WQI 88.1  87.3  94.2     Good        10   103 20 3.9 2.8

PR is here:https://github.com/poissonconsulting/wqbc/pull/42. You don't necessarily have to merge it since we're going to kill the ability to use the cesi code, but may want to anyway so that at least it's in the git history.

joethorley commented 9 years ago

That's great @ateucher I've

Also for completeness and the record I ran all four CI variants on the ccme with 10^5 replicates and no rounding (normally the package uses just 10^3 replicates for efficiency and rounds to 1 dp). The results are as follows:

>   calc_wqi(ccme, ci = "column", cesi_code = FALSE)
     WQI Lower  Upper Category Variables Tests F1  F2  F3
WQI 88.1 87.04 94.198     Good        10   103 20 3.9 2.8
>   calc_wqi(ccme, ci = "row", cesi_code = FALSE)
     WQI  Lower  Upper Category Variables Tests F1  F2  F3
WQI 88.1 87.269 94.197     Good        10   103 20 3.9 2.8
>   calc_wqi(ccme, ci = "column", cesi_code = TRUE)
     WQI  Lower  Upper Category Variables Tests F1  F2  F3
WQI 88.1 87.201 94.198     Good        10   103 20 3.9 2.8
>   calc_wqi(ccme, ci = "row", cesi_code = TRUE)
     WQI  Lower  Upper Category Variables Tests F1  F2  F3
WQI 88.1 87.263 94.197     Good        10   103 20 3.9 2.8
### tidy form
     WQI  Lower  Upper Category Variables Tests F1  F2  F3
WQI 88.1 87.217 94.198     Good        10   103 20 3.9 2.8

Finally I think an inevitable conclusion of this process is that the confidence intervals reported by El-Sharaawi are incorrect.

I think that's everything covered I'll leave it to you to close if you are happy.

stephhazlitt commented 9 years ago

Great work wqbc team!

ateucher commented 9 years ago

That's awesome @joethorley thanks! So just to be clear, does the bootstrapping on tidy data return the same result as the row-wise bootstrapping with wide data?

joethorley commented 9 years ago

Yes - I think discrepancies will only arise for data with relatively few visits most of which have missing values - in this situation the standard way will drop variables while the tidy data way will not. This could of course be confirmed via simulations.

ateucher commented 9 years ago

That's great. No need to do further testing now, I don't think. Case closed!