Closed madprogramer closed 1 year ago
As a final note, I tried passing the same formula to str2lang, but it worked perfectly fine.
str2lang('crp_m0broad+ega_m0broad+pain_m0broad+haq_m0broad+fatigue_m0broad+boolean_remission_m0broad+sdai_m0broad+sdai_remission_m0broad+booleanremission_3items_m0broad+das28_remission26_m0broad+das28_r')
> crp_m0broad + ega_m0broad + pain_m0broad + haq_m0broad + fatigue_m0broad +
boolean_remission_m0broad + sdai_m0broad + sdai_remission_m0broad +
booleanremission_3items_m0broad + das28_remission26_m0broad +
das28_r
Can you provide a dataset or the link to it?
Unfortunately my dataset is confidential, but I suspect it's the abundance of NA values which messes things up.
I can try and submit a reprex using the african-names dataset, that's one of the more popular ones with missing values.
Ok, I think I have solved it.
Somehow my types were mixed up so I had to manually convert into as.factor
and as.double
for variables that were mis-represented as strings.
So the short answer is: "This might happen if your factors are being miscast as string".
That solved the issue mostly, but now it gets stuck after ranking models.
-------------------- CHECK DATA REPORT END --------------------
✔ Data preprocessed.
✔ Data split and balanced.
✔ Correct formats prepared.
✔ Models successfully trained.
✔ Predicted successfully.
✔ Ranked and models list created.
Error in test_observed_labels[i] <- preprocessed_data$bin_labels[1]: replacement has length zero
Traceback:
1. forester::train(data = databank_sub, y = outcome, bayes_iter = 0,
. random_evals = 0, advanced_preprocessing = FALSE, type = "binary_clf",
. verbose = TRUE)
I suppose this error has a different reason @HubertR21
While performing
check_data
on a dataset, forester encountered an error before it could finish generating its report.Error and Traceback:
This error occurs before the
Dimensionality Check
step when callingmanage_missing
Any idea what might be going on?