bcgov / wqg_data

Refining the WQG list
GNU General Public License v3.0
3 stars 3 forks source link

Guideline type needs to be added back in please #35

Closed atillmanns closed 4 years ago

atillmanns commented 4 years ago

I see the column that described the guideline type (short, long, max, etc.) has been removed. This will be necessary for the user interface as it helps people interpret the guidelines and is used in all of our online documents.

joethorley commented 4 years ago

I think the information is fully contained in the Days and Period ie

1 Day >1 sample - short 1 Day 1 sample - max 30 Days - long

Can you confirm that I am interpreting this correctly?

We can add it as a filter to the shiny app but I don't think it makes sense to have it in the data sheet.

atillmanns commented 4 years ago

For aquatic life guidelines we have short-term acute and long-term chronic guidelines. For drinking water, there term maximum is used. These are included in our policy documents and derivation documents. This data sheet will eventually be pasted on DataBC and people will be able to use the sheet as they need. The inclusion of guideline type will be helpful for people when cross referencing with the policy and other online documents. I understand that it is extra information in terms of machine use but it is there any reason why it cannot be included?

joethorley commented 4 years ago

If folks want to directly refer to the data sheet then I agree it makes sense to add.

joethorley commented 4 years ago

@HeatherGranger Question - where the term maximum is used is there an minimum sample size?

HeatherGranger commented 4 years ago

All drinking water guidelines that are a maximum do not have a minimum sample size, except for: Coliforms Fecal Enterococci Escherichia coli

These 3 parameters have a minimum sample size of 5 samples in 30 days.

joethorley commented 4 years ago

Thanks!

joethorley commented 4 years ago

This is what we currently have

  Variable   EMS_Code Use      Media  Days Samples Statistic Notes          
  <chr>      <chr>    <chr>    <chr> <dbl>   <dbl> <chr>     <chr>          
1 Escherich… EMS_0147 Recreat… Water     1       5 geomean   Minimum 5 samp…
2 Escherich… EMS_0147 Recreat… Water     1       1 max       NA             
3 Escherich… EMS_0147 Drinkin… Water    30      10 quantile… 90th percentil…
4 Enterococ… EMS_0148 Recreat… Water     1       5 geomean   Minimum 5 samp…
5 Enterococ… EMS_0148 Recreat… Water     1       1 max       NA             
6 Enterococ… EMS_0148 Drinkin… Water    30      10 quantile… 90th percentil…
7 Coliforms… EMS_0450 Drinkin… Water    30      10 quantile… 90th percentil…

where the notes are in full below

[1] "Minimum 5 samples [I'M ASSUMING SHORT TERM]"                    
[2] NA                                                               
[3] "90th percentile calculated from 10 samples taken within 30 days"
[4] "Minimum 5 samples [I'M ASSUMING SHORT TERM]"                    
[5] NA                                                               
[6] "90th percentile calculated from 10 samples taken within 30 days"
[7] "90th percentile calculated from 10 samples taken within 30 days"

@HeatherGranger Can you confirm the Days and Samples are correct?

HeatherGranger commented 4 years ago

row 1: days should be 30. 5 samples is correct. row 2: correct row 3: samples should be 5. 30 days is correct. row 4: days should be 30. 5 samples is correct. row 5: correct row 6: samples should be 5. 30 days is correct. row 7: samples should be 5. 30 days is correct.

joethorley commented 4 years ago

Thanks @HeatherGranger

Can you confirm that the note

"90th percentile calculated from 10 samples taken within 30 days"

is incorrect for rows 3, 6 and 7 and can you confirm that we should replace the quantile90 statistic with the geomean.

HeatherGranger commented 4 years ago

The guidelines for rows 3, 6 and 7 is "90th percentile calculated from 5 samples within 30 days".

joethorley commented 4 years ago

Perfect - thanks!

joethorley commented 4 years ago

@atillmanns - can you think of a more informative name than GuidelineType - we are already using Type for Working, Approved etc.

joethorley commented 4 years ago

Need to fix #38 and #39 first

atillmanns commented 4 years ago

Re: > @atillmanns - can you think of a more informative name than GuidelineType - we are already using Type for Working, Approved etc.

How about change Guideline Type to Guideline Status (Approved, Interim, Working) and use Guideline Type for short, long, max, etc.?

joethorley commented 4 years ago

Yes this works!

joethorley commented 4 years ago

@atillmanns - what should we call those samples in GuideLineType where its the max of 1 sample in 1 day? Are these also max or are they a fourth category (see #39)?

atillmanns commented 4 years ago

For aquatic life, these would be short term, for drinking water, recreation, I think these are maximum but @HeatherGranger - can you please verify?

HeatherGranger commented 4 years ago

@joethorley The drinking water and recreation that are 1 sample in 1 day are a maximum.

joethorley commented 4 years ago

@atillmanns and @HeatherGranger

These are what we currently have

# A tibble: 6 x 4
   Days Samples Direction   Statistic 
  <dbl>   <dbl> <chr>       <chr>     
1     1       1 Lower Limit min       
2     1       1 Upper Limit max       
3    30       5 Lower Limit mean      
4    30       5 Upper Limit geomean   
5    30       5 Upper Limit mean      
6    30       5 Upper Limit quantile90

and these are possible uses

 [1] "Agriculture - Irrigation"   "Agriculture - Livestock"   
 [3] "Aquatic Life - Estuarine"   "Aquatic Life - Freshwater" 
 [5] "Aquatic Life - Marine"      "Dietary"                   
 [7] "Drinking Water"             "Drinking Water - Aesthetic"
 [9] "Recreation - Aesthetic"     "Recreation - Swimming"     
[11] "Tissue (Dietary)"           "Wildlife"

Can you define the exact rule for mapping them to guideline type?

HeatherGranger commented 4 years ago

@joethorley I'm not clear on what you need. For Guideline Type we're saying that's short term, long term or maximum? You need to know the rule for mapping the uses to guideline type?

joethorley commented 4 years ago

@HeatherGranger - I get that if Days = 30 then its always long term but I don't understand which of the limits based on Days = 1 and Samples = 1 is Short term versus maximum (I understand it varies by Use) and I don't know what to do about those which are actually minimums.

HeatherGranger commented 4 years ago

See @atillmanns and I's commits. Hopefully we did it right!

I added 'Narrative' to Type for Drinking Water Turbidity. Forgot to include this in commit message.

joethorley commented 4 years ago

@HeatherGranger - that worked thanks!

joethorley commented 4 years ago

Can you check that you agree with these combinations

# A tibble: 12 x 5
   Type                              Days Samples Direction   Statistic 
   <chr>                            <dbl>   <dbl> <chr>       <chr>     
 1 Aesthetic Objective                  1       1 Upper Limit max       
 2 Aesthetic Objective                 30       5 Upper Limit mean      
 3 Long-term chronic                    1       1 Upper Limit max       
 4 Long-term chronic                   30       5 Lower Limit mean      
 5 Long-term chronic                   30       5 Upper Limit mean      
 6 Maximum Acceptable Concentration     1       1 Upper Limit max       
 7 Maximum Acceptable Concentration    30       5 Upper Limit quantile90
 8 Primary Contact                      1       1 Lower Limit min       
 9 Primary Contact                      1       1 Upper Limit max       
10 Primary Contact                     30       5 Upper Limit geomean   
11 Short-term acute                     1       1 Lower Limit min       
12 Short-term acute                     1       1 Upper Limit max    

In particular

seem questionable - shall I pull up the specific entries?

HeatherGranger commented 4 years ago

The Variable would be helpful as everything is specific to the particular variable guideline. Pretty sure those are correct though. They don't always make the most sense as we've put them into spreadsheet form!

joethorley commented 4 years ago

These are the geomean ones.

It seems misleading to call them Maximum when they are based on the 90th quantile.

# A tibble: 3 x 20
  UniqueID Variable EMS_Code Use   Media Type   Days Samples Statistic Notes Condition PredictedEffect…
     <dbl> <chr>    <chr>    <chr> <chr> <chr> <dbl>   <dbl> <chr>     <chr> <chr>     <chr>           
1      263 Colifor… EMS_0450 Drin… Water Maxi…    30       5 quantile… NA    NA        No Effect       
2      384 Enteroc… EMS_0148 Drin… Water Maxi…    30       5 quantile… NA    NA        No Effect       
3      387 Escheri… EMS_0147 Drin… Water Maxi…    30       5 quantile… NA    NA        No Effect       
# … with 8 more variables: Direction <chr>, Limit <chr>, Units <chr>, Status <chr>, Reference <chr>,
#   `Reference Link` <chr>, `Overview Report Link` <chr>, `Technical Document Link` <chr>
joethorley commented 4 years ago

This is a subset of the 221 variables based on samples from 1 day but referred to as "long-term chronic"

# A tibble: 221 x 20
   UniqueID Variable EMS_Code Use   Media Type   Days Samples Statistic Notes Condition
      <dbl> <chr>    <chr>    <chr> <chr> <chr> <dbl>   <dbl> <chr>     <chr> <chr>    
 1        5 1,2-Dic… EMS_B022 Aqua… Sedi… Long…     1       1 max       NA    NA       
 2       13 1,2,4-T… EMS_T066 Aqua… Sedi… Long…     1       1 max       NA    NA       
 3       14 1,2,4-T… EMS_T066 Aqua… Sedi… Long…     1       1 max       NA    NA       
 4       18 1,4-Dic… EMS_B025 Aqua… Sedi… Long…     1       1 max       NA    NA       
 5       19 1,4-Dic… EMS_B025 Aqua… Sedi… Long…     1       1 max       NA    NA       
 6       20 2-Methy… EMS_PA28 Aqua… Sedi… Long…     1       1 max       NA    NA       
 7       21 2-Methy… EMS_PA28 Aqua… Sedi… Long…     1       1 max       NA    NA       
 8       22 2-Methy… EMS_PA28 Aqua… Sedi… Long…     1       1 max       NA    NA       
 9       23 2-Methy… EMS_PA28 Aqua… Sedi… Long…     1       1 max       NA    NA       
10       27 Acenaph… EMS_PA01 Aqua… Sedi… Long…     1       1 max       sedi… NA       
# … with 211 more rows, and 9 more variables: PredictedEffectLevel <chr>, Direction <chr>,
#   Limit <chr>, Units <chr>, Status <chr>, Reference <chr>, `Reference Link` <chr>, `Overview Report
#   Link` <chr>, `Technical Document Link` <chr>
HeatherGranger commented 4 years ago

For the drinking water guidelines - agreed isn't the best naming, but that's how the guidelines are organized for drinking water. I think of it like, the guideline is determining by calculating the 90th percentile, and that value is the guideline, the "maximum acceptable concentration" protective of human health.

atillmanns commented 4 years ago

A tibble: 12 x 5 Type Days Samples Direction Statistic

1 Aesthetic Objective 1 1 Upper Limit max 2 Aesthetic Objective 30 5 Upper Limit mean 3 Long-term chronic 1 1 Upper Limit max @joethorley - all the entries with long-term chronic and 1 sample should be for sediment. There is no requirement to calculate a monthly mean for sediment as the concentrations are not temporally variable (they are spatially variable but that is another issue).
joethorley commented 4 years ago

I can confirm this is the case.