`summarize_wqdata()` function throws with some datasets

aylapear commented 2 years ago

Two examples using the same EMS_ID/Station but different parameters/variables and in one case the function summarize_wqdata() works and provides a summary table while in the other case the table throws an error

Example where it works properly

``` data_works <- tibble::tibble( EMS_ID = c("0200016", "0200016", "0200016"), Station = c("ELK RIVER ABOVE HIGHWAY 93", "ELK RIVER ABOVE HIGHWAY 93","ELK RIVER ABOVE HIGHWAY 93"), Variable = c("Nitrogen Total", "Nitrogen Total", "Nitrogen Total"), Code = c("0114", "0114", "0114"), Value = c(0.844, 0.949, 0.754), Units = c("mg/L", "mg/L", "mg/L"), DetectionLimit = c(0.03, 0.03, 0.03), ResultLetter = c(NA, NA, NA), Date = c("2021-11-07", "2021-11-21", "2021-12-05"), Outlier = c(FALSE, FALSE, FALSE), Site_Renamed = c("ELK RIVER ABOVE HIGHWAY 93", "ELK RIVER ABOVE HIGHWAY 93","ELK RIVER ABOVE HIGHWAY 93"), UPPER_DEPTH = as.factor(c(NA, NA, NA)), Detected = as.factor(c("TRUE", "TRUE", "TRUE")), Timeframe = as.factor(c("2021", "2021", "2021")) ) wqbc::summarise_wqdata( data_works, by = c("EMS_ID"), censored = TRUE, na.rm = TRUE ) ``` Output ``` # A tibble: 1 × 14 Variable EMS_ID n ncen min max mean median lowerQ upperQ sd se lowerCL upperCL 1 Nitrogen Total 0200016 3 0 0.754 0.949 0.849 0.845 0.793 0.901 0.0799 0.0461 0.763 0.944 ```

Example where it fails

``` data_fails <- tibble::tibble( EMS_ID = c("0200016", "0200016", "0200016", "0200016", "0200016"), Station = c("ELK RIVER ABOVE HIGHWAY 93", "ELK RIVER ABOVE HIGHWAY 93","ELK RIVER ABOVE HIGHWAY 93", "ELK RIVER ABOVE HIGHWAY 93", "ELK RIVER ABOVE HIGHWAY 93"), Variable = c("Aluminum Total", "Aluminum Total","Aluminum Total","Aluminum Total","Aluminum Total"), Code = c("AL-T", "AL-T", "AL-T", "AL-T", "AL-T"), Value = c(0.031, 0.0192, 0.397, 0.0183, 0.1), Units = c("mg/L", "mg/L", "mg/L", "mg/L", "mg/L"), DetectionLimit = c(0.5, 0.5, 0.5, 0.5, 0.5), ResultLetter = c(NA, NA, NA, NA, NA), Date = c("2020-01-05","2020-01-27", "2020-02-02","2020-02-17","2020-03-01"), Outlier = c(FALSE, FALSE, FALSE, FALSE, FALSE), Site_Renamed = c("ELK RIVER ABOVE HIGHWAY 93", "ELK RIVER ABOVE HIGHWAY 93","ELK RIVER ABOVE HIGHWAY 93", "ELK RIVER ABOVE HIGHWAY 93", "ELK RIVER ABOVE HIGHWAY 93"), UPPER_DEPTH = as.factor(c(NA, NA, NA, NA, NA)), Detected = as.factor(c("FALSE", "FALSE", "FALSE", "FALSE", "FALSE")), Timeframe = as.factor(c("2020", "2020", "2020", "2020", "2020")) ) wqbc::summarise_wqdata( data_fails, by = c("EMS_ID"), censored = TRUE, na.rm = TRUE ) ``` Output ``` Error in names(ret) <- c("mean", "se", LCL(x), UCL(x)) : 'names' attribute [4] must be the same length as the vector [0] In addition: Warning message: In survreg.fit(X, Y, weights, offset, init = init, controlvals = control, : Ran out of iterations and did not converge Backtrace: ▆ 1. └─wqbc::summarise_wqdata(...) 2. └─plyr::ddply(...) 3. └─plyr::ldply(...) 4. └─plyr::llply(...) 5. ├─plyr:::loop_apply(n, do.ply) 6. └─plyr ``(1L) 7. └─wqbc .fun(piece, ...) 8. ├─base::mean(ml) 9. └─NADA::mean(ml) 10. └─NADA .local(x, ...) ```

aylapear commented 2 years ago

@joethorley

joethorley commented 2 years ago

thanks @aylapear - I'll look into

HeatherGranger commented 2 years ago

@joethorley do you remember if this is still ongoing? if so, perhaps something to examine what the dependencies are and what's worth updating in it's current format.