KWB-R / fhpredict

R Package for the Project Flusshygiene
https://kwb-r.github.io/fhpredict
MIT License
2 stars 0 forks source link

Error in build_and_validate_model() #51

Closed hsonne closed 4 years ago

hsonne commented 4 years ago

The code below currently leads to the following error:

Fehler: Constant variable(s) found: log_e.coli
Zusätzlich: Warnmeldungen:
1: attempting model selection on an essentially perfect fit is nonsense 
2: In summary.lm(model) :

Code to reproduce the error:

name <- "spot-data_user-9_spot-41_2020-02-04.RData"
file <- system.file("extdata/testdata", name, package = "fhpredict")
spot_data <- kwb.utils::loadObject(file, "spot_data")
set.seed(1)
result <- fhpredict:::build_and_validate_model(spot_data)
hsonne commented 4 years ago
  log_e.coli     r_mean r_mean_abs_1 r_mean_abs_2 r_mean_abs_3 r_mean_abs_4
1   1.176091 0.08004271     0.000000    0.0000000            0    0.0000000
2   1.176091 0.00000000     0.000000    0.0000000            0    0.0000000
3   1.176091 0.32449605     0.000000    0.0000000            0    0.0000000
4   1.176091 0.00000000     0.000000    0.0000000            0    0.1823216
5   1.176091 0.00000000     1.561647    0.3364722            0    0.0000000
6   1.176091 0.32449605     0.000000    1.2479897            0    1.1205912

@wseis Is the problem that all log(e.coli) are the same? Should we check this in advance and return with an according error message?

wseis commented 4 years ago

yes, exactly, I noticed this issue today as well. I thought of adding some small random noise to each measurement, before taking the logarithm to avoid this error. Some like:

log_e.coli <- log10(conc_ec + rnorm (nrow(conc_ec), 0, 5) )
hsonne commented 4 years ago

As discussed, I will apply this "random noise" to all E. coli values before any log-calculation. I will use round(rnorm(n, 0, 5)) with n being the number of E. coli values.

wseis commented 4 years ago

correction , please reduce the standard deviation to 2. Taking 5 might actually have larger effects than expected because of the log10 scale