Open aylapear opened 2 years ago
@joethorley
thanks @aylapear - I'll look into
@joethorley do you remember if this is still ongoing? if so, perhaps something to examine what the dependencies are and what's worth updating in it's current format.
The cause of this is error when all the data points are censored, specifically when the Censored = TRUE for every row.
# real data output
EMS_ID Station Variable Code Value Units DetectionLimit ResultLetter Date
1 0200016 ELK RIVER ABOVE HIGHWAY 93 Aluminum Total AL-T 0.5 mg/L 0.5 NA 2020-01-05
2 0200016 ELK RIVER ABOVE HIGHWAY 93 Aluminum Total AL-T 0.5 mg/L 0.5 NA 2020-01-27
3 0200016 ELK RIVER ABOVE HIGHWAY 93 Aluminum Total AL-T 0.5 mg/L 0.5 NA 2020-02-02
4 0200016 ELK RIVER ABOVE HIGHWAY 93 Aluminum Total AL-T 0.5 mg/L 0.5 NA 2020-02-17
5 0200016 ELK RIVER ABOVE HIGHWAY 93 Aluminum Total AL-T 0.5 mg/L 0.5 NA 2020-03-01
Outlier Site_Renamed UPPER_DEPTH Detected Timeframe Censored
1 FALSE ELK RIVER ABOVE HIGHWAY 93 <NA> FALSE 2020 TRUE
2 FALSE ELK RIVER ABOVE HIGHWAY 93 <NA> FALSE 2020 TRUE
3 FALSE ELK RIVER ABOVE HIGHWAY 93 <NA> FALSE 2020 TRUE
4 FALSE ELK RIVER ABOVE HIGHWAY 93 <NA> FALSE 2020 TRUE
5 FALSE ELK RIVER ABOVE HIGHWAY 93 <NA> FALSE 2020 TRUE
Example to show which code is throwing error
# this throws error
df <- tibble(
Value = c(0.844, 0.949, 0.754),
Censored = c(TRUE, TRUE, TRUE)
)
ml <- with(
df,
cenmle(
Value,
Censored,
dist = "lognormal",
conf.int = 0.95
)
)
ml
est <- mean(ml)
# As long as one censored value is false it will run
df <- tibble(
Value = c(0.844, 0.949, 0.754),
Censored = c(FALSE, TRUE, TRUE)
)
ml <- with(
df,
cenmle(
Value,
Censored,
dist = "lognormal",
conf.int = 0.95
)
)
ml
est <- mean(ml)
This error is caused by the internal function summarise_wqdata_by()
in the summaries-wqdata.R file
It also fails on a single value
> df <- tibble(
+ Value = c(0.011),
+ Censored = c(TRUE)
+ )
> ml <- with(
+ df,
+ cenmle(
+ Value,
+ Censored,
+ dist = "lognormal",
+ conf.int = 0.95
+ )
+ )
> ml
Error in exp(x@survreg$coef[1]) :
non-numeric argument to mathematical function
# fails
df <- tibble(
Value = c(0.011),
Censored = c(FALSE)
)
# passes when two values even if one is censored
df <- tibble(
Value = c(0.011, 0.0001),
Censored = c(FALSE, TRUE)
)
In these edge cases when either all values are censored or only a single value is given the function should return NA' s instead of an error.
This would then generate this table when the site that has no data display NA's instead of throwing an error.
# A tibble: 2 × 14
Variable EMS_ID n ncen min max mean median lowerQ upperQ sd se lowerCL upperCL
<chr> <chr> <int> <int> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 Nitrogen Total 0200016 3 0 0.754 0.949 0.849 0.845 0.793 0.901 0.0799 0.0461 0.763 0.944
1 Nitrogen Total 0478416 NA NA NA NA NA NA NA NA NA NA NA
I agree - and when no data it should return a table with the same columns and classes and no rows.
Two examples using the same EMS_ID/Station but different parameters/variables and in one case the function
summarize_wqdata()
works and provides a summary table while in the other case the table throws an errorExample where it works properly
``` data_works <- tibble::tibble( EMS_ID = c("0200016", "0200016", "0200016"), Station = c("ELK RIVER ABOVE HIGHWAY 93", "ELK RIVER ABOVE HIGHWAY 93","ELK RIVER ABOVE HIGHWAY 93"), Variable = c("Nitrogen Total", "Nitrogen Total", "Nitrogen Total"), Code = c("0114", "0114", "0114"), Value = c(0.844, 0.949, 0.754), Units = c("mg/L", "mg/L", "mg/L"), DetectionLimit = c(0.03, 0.03, 0.03), ResultLetter = c(NA, NA, NA), Date = c("2021-11-07", "2021-11-21", "2021-12-05"), Outlier = c(FALSE, FALSE, FALSE), Site_Renamed = c("ELK RIVER ABOVE HIGHWAY 93", "ELK RIVER ABOVE HIGHWAY 93","ELK RIVER ABOVE HIGHWAY 93"), UPPER_DEPTH = as.factor(c(NA, NA, NA)), Detected = as.factor(c("TRUE", "TRUE", "TRUE")), Timeframe = as.factor(c("2021", "2021", "2021")) ) wqbc::summarise_wqdata( data_works, by = c("EMS_ID"), censored = TRUE, na.rm = TRUE ) ``` Output ``` # A tibble: 1 × 14 Variable EMS_ID n ncen min max mean median lowerQ upperQ sd se lowerCL upperCL
Example where it fails
``` data_fails <- tibble::tibble( EMS_ID = c("0200016", "0200016", "0200016", "0200016", "0200016"), Station = c("ELK RIVER ABOVE HIGHWAY 93", "ELK RIVER ABOVE HIGHWAY 93","ELK RIVER ABOVE HIGHWAY 93", "ELK RIVER ABOVE HIGHWAY 93", "ELK RIVER ABOVE HIGHWAY 93"), Variable = c("Aluminum Total", "Aluminum Total","Aluminum Total","Aluminum Total","Aluminum Total"), Code = c("AL-T", "AL-T", "AL-T", "AL-T", "AL-T"), Value = c(0.031, 0.0192, 0.397, 0.0183, 0.1), Units = c("mg/L", "mg/L", "mg/L", "mg/L", "mg/L"), DetectionLimit = c(0.5, 0.5, 0.5, 0.5, 0.5), ResultLetter = c(NA, NA, NA, NA, NA), Date = c("2020-01-05","2020-01-27", "2020-02-02","2020-02-17","2020-03-01"), Outlier = c(FALSE, FALSE, FALSE, FALSE, FALSE), Site_Renamed = c("ELK RIVER ABOVE HIGHWAY 93", "ELK RIVER ABOVE HIGHWAY 93","ELK RIVER ABOVE HIGHWAY 93", "ELK RIVER ABOVE HIGHWAY 93", "ELK RIVER ABOVE HIGHWAY 93"), UPPER_DEPTH = as.factor(c(NA, NA, NA, NA, NA)), Detected = as.factor(c("FALSE", "FALSE", "FALSE", "FALSE", "FALSE")), Timeframe = as.factor(c("2020", "2020", "2020", "2020", "2020")) ) wqbc::summarise_wqdata( data_fails, by = c("EMS_ID"), censored = TRUE, na.rm = TRUE ) ``` Output ``` Error in names(ret) <- c("mean", "se", LCL(x), UCL(x)) : 'names' attribute [4] must be the same length as the vector [0] In addition: Warning message: In survreg.fit(X, Y, weights, offset, init = init, controlvals = control, : Ran out of iterations and did not converge Backtrace: ▆ 1. └─wqbc::summarise_wqdata(...) 2. └─plyr::ddply(...) 3. └─plyr::ldply(...) 4. └─plyr::llply(...) 5. ├─plyr:::loop_apply(n, do.ply) 6. └─plyr `