alexzwanenburg / familiar

Repository for the familiar R-package. Familiar implements an end-to-end pipeline for interpretable machine learning of tabular data.
European Union Public License 1.2
30 stars 3 forks source link

Error thrown while running the familiar package vignettes - introduction.Rmd and prospective_use.Rmd #79

Closed Kuroshiwo closed 5 months ago

Kuroshiwo commented 5 months ago

The error below was thrown while running vignettes introduction.Rmd and prospective_use.Rmd that came with the package:

Error in g$grobs[[grob_index]] <- *vtmp* : attempt to select less than one element in OneIndex

Here is the output, including the sessionInfo()

Restarting R session...

library(familiar) Registered S3 method overwritten by 'data.table': method from print.data.table
library(data.table) data.table 1.15.4 using 6 threads (see ?getDTthreads). Latest news: r-datatable.com knitr::opts_chunk$set(

  • collapse = TRUE,
  • comment = "#>"
  • ) library(familiar) library(data.table)

Example experiment using the iris dataset.

You may want to specify a different path for experiment_dir.

This is where results are written to.

familiar::summon_familiar(data=iris,

  • experiment_dir=file.path(tempdir(), "familiar_1"),
  • outcome_type="multinomial",
  • outcome_column="Species",
  • experimental_design="fs+mb",
  • cluster_method="none",
  • fs_method="mrmr",
  • learner="glm",
  • parallel=FALSE) Setup report: Validation is internal only. Setup report: Feature selection and model building on the training data. Creating iterations: Starting creation of iterations. Creating iterations: Finished creation of iterations. Creating iterations: New project id is: '20240424104716'.

Pre-processing: Starting preprocessing for run 1 of 1. Pre-processing: 150 samples were initially available. Pre-processing: 0 samples were removed because of missing outcome data. 150 samples remain. Pre-processing: 4 features were initially available. Pre-processing: 0 features were removed because of a high fraction of missing values. 4 features remain. Pre-processing: 0 samples were removed because of missing feature data. 150 samples remain. Pre-processing: 0 features were removed due to invariance. 4 features remain. Pre-processing: Adding value distribution statistics to features. Pre-processing: Performing transformations to normalise feature value distributions. |======================================================================| 100% Pre-processing: Feature distributions have been transformed for normalisation. Pre-processing: Extracting normalisation parameters from feature data. |======================================================================| 100% Pre-processing: Feature data were normalised. |======================================================================| 100% Pre-processing: Adding imputation information to features. |======================================================================| 100% |======================================================================| 100%

Feature selection: starting feature selection using "mrmr" method. Hyperparameter optimisation: Starting parameter optimisation for the mrmr variable importance method.

Hyperparameter optimisation: Completed parameter optimisation for the mrmr variable importance method. |======================================================================| 100% Feature selection: feature selection using "mrmr" method has been completed.

Model building: starting model building using "glm" learner, based on "mrmr" feature selection. Hyperparameter optimisation: Starting parameter optimisation for the glm learner, based on variable importances from the mrmr variable importance method.

Starting hyperparameter optimisation for data subsample 1 of 1.
  Hyperparameter optimisation is conducted using the auc_roc metric by maximising out-of-bag performance.
  Candidate hyperparameter sets after the initial run are selected after inferring utility using a localised approximate Gaussian Process.
  Utility is measured as expected improvement.
  Computing variable importance for 20 bootstraps.

|======================================================================| 100% Hyperparameter optimisation: All hyperparameters are fixed. No optimisation is required.

Hyperparameter optimisation: Completed parameter optimisation for the glm learner, based on variable importances from the mrmr variable importance method. |======================================================================| 100% Model building: model building using "glm" learner, based on "mrmr" feature selection, has been completed.

Evaluation: Creating ensemble models from individual models. |======================================================================| 100%

Evaluation: Processing data to create familiarData objects.

Evaluation: Processing dataset 1 of 1. Computing pairwise similarity between features. Computing the point estimate of the value(s) of interest for the ensemble model as a whole. Computing pairwise similarity between samples. Computing the point estimate of the value(s) of interest for the ensemble model as a whole. |======================================================================| 100% Extracting variable importance obtained during feature selection. Extracting variable importance obtained from the models. Computing the point estimate of the value(s) of interest for the ensemble model from the single underlying model. Computing permutation variable importance for models in the dataset. Computing the bias-corrected estimate with confidence interval of the value(s) of interest for the ensemble model from the single underlying model. 400 bootstrap samples are obtained in total. |======================================================================| 100% Compute feature expression. Extracting univariate analysis information. Extracting hyperparameters from the models in the ensemble. Computing the point estimate of the value(s) of interest for the ensemble model from the single underlying model. Computing ensemble predictions for the dataset. Computing the point estimate of the value(s) of interest for the ensemble model as a whole. Computing model performance metrics on the dataset. Computing the bias-corrected estimate with confidence interval of the value(s) of interest for the ensemble model from the single underlying model. 400 bootstrap samples are obtained in total. |======================================================================| 100% Computing data for decision curve analysis. Computing the bias-corrected estimate with confidence interval of the value(s) of interest for the ensemble model from the single underlying model. 400 bootstrap samples are obtained in total. Computing decision curves for the "setosa" class. |======================================================================| 100% Computing decision curves for the "versicolor" class. |======================================================================| 100% Computing decision curves for the "virginica" class. |======================================================================| 100% Assessing model calibration. Computing the bias-corrected estimate with confidence interval of the value(s) of interest for the ensemble model from the single underlying model. 400 bootstrap samples are obtained in total. |======================================================================| 100% Computing receiver-operating characteristic curves. Computing the bias-corrected estimate with confidence interval of the value(s) of interest for the ensemble model from the single underlying model. 400 bootstrap samples are obtained in total. Computing ROC and Precision-Recall curves for the "setosa" class. |======================================================================| 100% Computing ROC and Precision-Recall curves for the "versicolor" class. |======================================================================| 100% Computing ROC and Precision-Recall curves for the "virginica" class. |======================================================================| 100% Computing confusion matrix. Computing the point estimate of the value(s) of interest for the ensemble model as a whole. Computing individual conditional expectation and partial dependence data for features in the dataset. extract_dispatcher,familiarEnsemble,familiarDataElement: too few models to compute confidence intervals. Computing the point estimate of the value(s) of interest for the ensemble model as a whole. Computing ICE / PD curves for "Petal_Width". Evaluation: familiarData object 20240424104716_glm_mrmr_1_1_ensemble_1_1_development_data was created.

Evaluation: Creating collection pooled_data

Evaluation: Exporting data from collection pooled_data Error in g$grobs[[grob_index]] <- *vtmp* : attempt to select less than one element in OneIndex

Show Traceback

12.plotting.to_grob(p_outcome) 11..plot_sample_clustering_plot(x = x_sub, data = feature_expression_split, feature_similarity = feature_similarity_split, sample_similarity = sample_similarity_split, outcome_type = object@outcome_type, x_axis_by = x_axis_by, y_axis_by = y_axis_by, facet_by = facet_by, facet_wrap_cols = facet_wrap_cols, ... 10.(new("standardGeneric", .Data = function (object, feature_cluster_method = waiver(), feature_linkage_method = waiver(), sample_cluster_method = waiver(), sample_linkage_method = waiver(), sample_limit = waiver(), draw = FALSE, dir_path = NULL, split_by = NULL, x_axis_by = NULL, ... 9.(new("standardGeneric", .Data = function (object, feature_cluster_method = waiver(), feature_linkage_method = waiver(), sample_cluster_method = waiver(), sample_linkage_method = waiver(), sample_limit = waiver(), draw = FALSE, dir_path = NULL, split_by = NULL, x_axis_by = NULL, ... 8.do.call(plot_sample_clustering, args = c(list(object = object, dir_path = dir_path), list(...))) 7..local(object, ...) 6.plot_all(object = fam_collection, dir_path = file_paths$results_dir) 5.plot_all(object = fam_collection, dir_path = file_paths$results_dir) 4.FUN(X[[i]], ...) 3.lapply(collection_list, .process_collections, file_paths = file_paths, message_indent = message_indent, verbose = verbose) 2.run_evaluation(cl = cl, proj_list = project_info, settings = settings, file_paths = file_paths, verbose = verbose) 1.familiar::summon_familiar(data = iris, experiment_dir = file.path(tempdir(), "familiar_1"), outcome_type = "multinomial", outcome_column = "Species", experimental_design = "fs+mb", cluster_method = "none", fs_method = "mrmr", learner = "glm", parallel = FALSE)

sessionInfo() R version 4.3.3 (2024-02-29 ucrt) Platform: x86_64-w64-mingw32/x64 (64-bit) Running under: Windows 11 x64 (build 22621)

Matrix products: default

locale: [1] LC_COLLATE=English_United States.utf8 [2] LC_CTYPE=English_United States.utf8
[3] LC_MONETARY=English_United States.utf8 [4] LC_NUMERIC=C
[5] LC_TIME=English_United States.utf8

time zone: America/Denver tzcode source: internal

attached base packages: [1] stats graphics grDevices utils datasets methods base

other attached packages: [1] data.table_1.15.4 familiar_1.4.6 librarian_1.8.1

loaded via a namespace (and not attached): [1] gtable_0.3.5 shape_1.4.6.1 xfun_0.43
[4] ggplot2_3.5.1 processx_3.8.4 lattice_0.22-6
[7] callr_3.7.6 quadprog_1.5-8 vctrs_0.6.5
[10] tools_4.3.3 ps_1.7.6 generics_0.1.3
[13] parallel_4.3.3 tibble_3.2.1 proxy_0.4-27
[16] fansi_1.0.6 pkgconfig_2.0.3 Matrix_1.6-5
[19] nnls_1.5 lifecycle_1.0.4 farver_2.1.1
[22] compiler_4.3.3 stringr_1.5.1 textshaping_0.3.7
[25] munsell_0.5.1 codetools_0.2-19 praznik_11.0.0
[28] glmnet_4.1-8 FMStable_0.1-4 Formula_1.2-5
[31] pillar_1.9.0 iterators_1.0.14 rpart_4.1.23
[34] foreach_1.5.2 tidyselect_1.2.1 mvtnorm_1.2-4
[37] inum_1.0-5 stringi_1.8.3 dplyr_1.1.4
[40] reshape2_1.4.4 labeling_0.4.3 splines_4.3.3
[43] grid_4.3.3 colorspace_2.1-0 cli_3.6.2
[46] magrittr_2.0.3 mboost_2.9-9 survival_3.6-4
[49] utf8_1.2.4 withr_3.0.0 libcoin_1.0-10
[52] scales_1.3.0 harmonicmeanp_3.0.1 nnet_7.3-19
[55] qvalue_2.34.0 ragg_1.3.0 isotree_0.6.1-1
[58] knitr_1.46 rlang_1.1.3 Rcpp_1.0.12
[61] partykit_1.2-20 glue_1.7.0 rstudioapi_0.16.0
[64] rstream_1.3.7 jsonlite_1.8.8 R6_2.5.1
[67] plyr_1.8.9 stabs_0.6-4 systemfonts_1.0.6

alexzwanenburg commented 5 months ago

Thanks for reporting this issue. I will look into it. The error message g$grobs[[grob_index]] <- *vtmp* does not appear directly in familiar, but may be caused by how familiar interacts with the gtable package.

alexzwanenburg commented 5 months ago

I found the reason for the error and was able to resolve it.

Kuroshiwo commented 5 months ago

Thank you very much Alex. I am tremendously grateful for the time you spent to resolve the issue.

Best regards,

Richard

On Tue, May 14, 2024 at 6:45 AM Alex Zwanenburg @.***> wrote:

Closed #79 https://github.com/alexzwanenburg/familiar/issues/79 as completed via #81 https://github.com/alexzwanenburg/familiar/pull/81.

— Reply to this email directly, view it on GitHub https://github.com/alexzwanenburg/familiar/issues/79#event-12801656586, or unsubscribe https://github.com/notifications/unsubscribe-auth/ACXK2ARSWJBUCKW3ITYQB4LZCIBNPAVCNFSM6AAAAABGXKBHJSVHI2DSMVQWIX3LMV45UABCJFZXG5LFIV3GK3TUJZXXI2LGNFRWC5DJN5XDWMJSHAYDCNRVGY2TQNQ . You are receiving this because you authored the thread.Message ID: @.***>