Closed thaocad closed 3 years ago
It seems like that one of the validation checks returns NA, but I cannot tell much about it without the actual score and label data. Is is possible for you to attach predsToPlot
and labelsToPlot
to your comment? I think the maximum file size you can attach is around 25MB. GitHub accepts .zip and .gz file types so that the size shouldn't be a problem.
I had a very similar issue. It is not because of the NA
s. It is because your labels (of one fold) are all of one level (i.e. TRUE
or FALSE
).
It happened to me because one of the folds have all instances labeled as FALSE
, so that it has this line in the error message:
11. assertthat::assert_that(is.atomic(pb[["sensitivity"]]), is.vector(pb[["sensitivity"]]),
. is.numeric(pb[["sensitivity"]]), pb[["sensitivity"]][1] ==
. 0, pb[["sensitivity"]][n] == 1)
It happens mostly when you have a small sample size and you do n-fold cross-validation.
I understand that specificity
or 1-sensitivity
could be a denominator in the metric calculations, but I think the message is misleading.
I guess maybe we could add a delta to the denominator and some warning messages for this situation, or separate that assert statement and give different messages for different checks.
OK. I have to admit it is a terrible error message. The main procedure returns NA
values for specificity and sensitivity respectively when N = 0 and P = 0. NA
values are considered as 'undefined' or missing values in R.
Specificity and sensitivity must be 'undefined' since we cannot evaluate the classifier with one of the classes. In other words, they cannot be an arbitrary real value, including 0 and 1, rather than NA
when N = 0 or P = 0.
I have modified the validation part to output improved error messages in case a single class dataset is provided. The next version of precrec will be released on CRAN in the beginning of January next year after CRAN submission team gets back from their winter holiday.
> library(precrec)
> # N = 0
> samp_n0 <- create_sim_samples(1, 10, 0, "random")
> evalmod(mmdata(samp_n0$scores, samp_n0$labels))
Error: Curves cannot be calculated. Only a single class (positive) found in dataset (modname: m1, dsid: 1).
> # P = 0
> samp_p0 <- create_sim_samples(1, 0, 10, "random")
> evalmod(mmdata(samp_p0$scores, samp_p0$labels))
Error: Curves cannot be calculated. Only a single class (negative) found in dataset (modname: m1, dsid: 1).
Makes sense. Thank you!
Precrec v0.12 is on CRAN now.
Hi, I was running into this error message trying to call the evalmod() function as following:
Error: assert_that: missing values present in assertion 12. stop("assert_that: missing values present in assertion", call. = FALSE) at assert-that.r#88 11. check_result(res) at assert-that.r#72 10. see_if(..., env = env, msg = msg) at assert-that.r#50 9. assertthat::assert_that(is.atomic(pb[["specificity"]]), is.vector(pb[["specificity"]]), is.numeric(pb[["specificity"]]), pb[["specificity"]][1] == 1, pb[["specificity"]][n] == 0) at pl4_calc_measures.R#106 8. .validate.pevals(s3obj) at etc_utils_validate_obj.R#4 7. .validate(s3obj) at pl4_calc_measures.R#47 6. calc_measures(cdat) at pl2_pipeline_main_rocprc.R#17 5. FUN(X[[i]], ...) 4. lapply(seq_along(mdat), plfunc) at pl2_pipeline_main_rocprc.R#20 3. .pl_main_rocprc(mdat, model_type, dataset_type, class_name_pf, calc_avg = calc_avg, cb_alpha = cb_alpha, raw_curves = raw_curves, x_bins = x_bins) at pl1_pipeline_main.R#21 2. pl_main(mdat, mode = new_mode, calc_avg = calc_avg, cb_alpha = cb_alpha, raw_curves = raw_curves, x_bins = x_bins, na_worst = new_na_worst, ties_method = new_ties_method, validate = FALSE) at main_evalmod.R#332 1. evalmod(scores = predsToPlot, labels = labelsToPlot)
Can you please help? Did I miss anything? Thank you very much. Thao