AlineTalhouk / splendid

Supervised Learning Ensemble for Diagnostic Identification
https://alinetalhouk.github.io/splendid/
Other
1 stars 0 forks source link

Error when specifiying a single algorithm #7

Closed AlineTalhouk closed 7 years ago

AlineTalhouk commented 7 years ago

Here is a reproducible example:

library(tidyverse)
library(splendid)
data(hgsc)
data <- hgsc

# Modelling Examples ------------------------------------------------------

# Obtain True classes from hgsc sample names
class <- stringr::str_split_fixed(rownames(data), "_", n = 2)[, 2]
sl_one <- splendid(data, class, n = 1, algorithms = "lda")
AlineTalhouk commented 7 years ago

I think this is not a bug but more of an implementation issue. Perhaps similar to consensus_cluster we need a function to first collect the results and then we process them and wrap in splendid

AlineTalhouk commented 7 years ago

Let me attempt to fix this

dchiu911 commented 7 years ago

I'm on it now

AlineTalhouk commented 7 years ago

I think this is the first function.. let's call it trainBoot:


trainBoot <- function(data, class, n, seed = 1, algorithms=NULL) {

  # Generate bootstrap resamples; test samples are those not chosen in training
  set.seed(seed)
  class <- as.factor(class)  # ensure class is a factor
  train.idx <- training_id(data = data, class = class, n = n)
  test.idx <- purrr::map(train.idx, ~ which(!seq_len(nrow(data)) %in% .x))

  # Classification algorithms to use and their model function calls
  algs <- algorithms %||% ALG.NAME %>%
    stats::setNames(., .)  # if null, use all

  # Apply training sets to models and predict on the test sets
  name <- measure <- value <- NULL
  models <- purrr::map(algs,
                       ~ purrr::map(train.idx, function(id) 
                         classification(data[id, ], class[id], .x)))
  preds <- purrr::map(models,
                      ~ purrr::pmap(list(.x, test.idx, train.idx),
                                    prediction, data = data, class = class))
  evals <- purrr::map_at(preds, "pam", purrr::map, 1) %>% 
    purrr::map(~ purrr::map2(test.idx, .x, ~ evaluation(class[.x], .y)) %>%
                 purrr::map_df(purrr::flatten)) %>% 
    tibble::enframe() %>% 
    tidyr::unnest() 
  return(list(model = models, pred = preds,eval = evals))
  }
AlineTalhouk commented 7 years ago

We need an independent function: Ensemble, which will take in the output of several algorithms on several bootstrap samples and then will pick the best performing algorithm. This could be the output of BootTRain if it was run in serial mode or it could be the a gathered output from a cluster

dchiu911 commented 7 years ago

The bug should be fixed, please verify. We should open a new issue to discuss new functions.

AlineTalhouk commented 7 years ago

yes it is fixed in that it doesn't produce an error, but it doesn't make any sense? how is the min, lower, mean etc? there is one bootstrap sample one algorithm output should be a vector

dchiu911 commented 7 years ago

The other propagating logical errors are not fixed. I only fixed the specific runtime error you encountered. To make the rest of the output make sense, more time and careful thinking will be needed