Error: Can't subset columns that don't exist. May be a bug in complement() function inside get_features() function

qindan2008 commented 3 years ago

I met the same error. I guess the bug may be inside the complement() function from get_features() function. When both length(miss_Pat) >0 and length(miss_drv) > 0.

For my data, M is a 8 x 14 tibble, original M‘s colnames is: patientID ADAM23 AGAP3 AIRE DST FH IQGAP1 KIAA1109 KLHDC7B MYH11 POGZ SYNE1 ZNF292 ZSWIM6 N is a 7 x 12 tibble, original N's colnames is: patientID ADAM23 AGAP3 AIRE DST FH IQGAP1 KIAA1109 KLHDC7B MYH11 POGZ SYNE1

After run:

if (length(miss_Pat) > 0) { empty = M %>% filter(patientID %in% !!miss_Pat) empty[, 2:ncol(empty)] = 0 N = bind_rows(N, empty) }

N turns to be a 8 x 14 tibble, and N's colnames are the same with M's: patientID ADAM23 AGAP3 AIRE DST FH IQGAP1 KIAA1109 KLHDC7B MYH11 POGZ SYNE1 ZNF292 ZSWIM6

N's patientID turn to be the same with M's, but the order of patientID in them isn't same.

However, after run: N = bind_cols(N, M %>% select(!!miss_drv) %>% replace(TRUE, 0))" N's colnames turns to be: patientID ADAM23 AGAP3 AIRE DST FH IQGAP1 KIAA1109 KLHDC7B MYH11 POGZ SYNE1 ZNF292...13 ZSWIM6...14 ZNF292...15 ZSWIM6...16 So, when run the flowing code: N[, colnames(M)] , error occured: Error: Can't subset columns that don't exist. ✖ Columns ZNF292 and ZSWIM6 don't exist.

caravagn commented 3 years ago

I am happy to look into this - can you share an example dataset to replicate the error?

I have to run this step by step to get where columns are lost.

qindan2008 commented 3 years ago

This is the example data for repeating the error. I'm sorry that I can't open my research data to the public, so I modified the data to cover some information. But I think it's enough to find some clue of the bug. Sorry for that.

Thank you ! example_data.txt

caravagn commented 3 years ago

I appreciated you taking time to generate the data, and giving me time to fix it; there were some errors on my side, and some on the data you shared (maybe those are there unintentially).

Let me start to fix the input

require(tidyverse)

# Load your data
input = readr::read_tsv("~/Downloads/example_data.txt") %>% 
  mutate(cluster = paste(cluster))

require(revolver)

# Make cohort
my_cohort = revolver_cohort(
  input, 
  CCF_parser = CCF_parser,
  ONLY.DRIVER = FALSE, 
  MIN.CLUSTER.SIZE = 5, # remove small clusters (some have just 1 mutation)
  annotation = "Test dataset"
)

This crashes on patient S3 (and also S4), reason being that some regions in this patient have the same biopsy id, which should not happen. See for instance one row for the file, there are two S3P1 and S3P2:

S3P1:0.88;S3P2:0.86;S3P1:0.88;S3P2:0.88

So I continue dropping those patients

# Make cohort
my_cohort = revolver_cohort(
  input %>% filter(!(patientID %in% c("S3", "S4"))), 
  CCF_parser = CCF_parser,
  ONLY.DRIVER = FALSE, 
  MIN.CLUSTER.SIZE = 5, # remove small clusters (some have just 1 mutation)
  annotation = "Test dataset"
)

The cohort is good notw I check other stuff to see if everything makes sense and find something is still to fix.

# First, a print reports that Some driver variantIDs occur only once and should therefore be removed. 
print(my_cohort)

# This shows which one give the error
to_remove = Stats_drivers(my_cohort) %>% 
  filter(N_tot == 1) 
print(to_remove)

# We remove those by using the variantID
my_cohort = remove_drivers(
  my_cohort,
  to_remove %>% 
    pull(variantID)
)

So so far I just removed variantIDs of mutations that are annotated as driver, but occur only once - by definition they cannot be correlated (this is also described in the vignettes)..

Now I can try to make trees

# Attempt to compute trees
my_cohort = compute_clone_trees(my_cohort, sspace.cutoff = 1000, n.sampling = 500)

The above command crashed at patient S7, raising errors

# [easypar] run 5 - Error in ClonEvol_surrogate(clusters, samples, clonal.cluster, min.CCF = 0.01): 
#   This patient has no trees, raising an error. Check you CCF estimates ...

This means that S7 has CCF that are inconsistent with all the possible trees we can build. If this was real data, you should check your clustering and CCF analysis.

We remove this patient and go on.

# We remove S7, this can remove further drivers
my_cohort = remove_patients(my_cohort, "S7")

# This gives no errors now, we can rebuild the trees
print(my_cohort)

my_cohort = compute_clone_trees(my_cohort, sspace.cutoff = 1000, n.sampling = 500)

# Trees are now there (see Trees per patient    : YES )
print(my_cohort)

So now we have trees, we can fit and cluster the data.

# We can fit the cohort
my_cohort = revolver_fit(
  my_cohort, 
  parallel = F, 
  n = 3, 
  initial.solution = NA)

# .. compute clusters
my_cohort = revolver_cluster(
  my_cohort, 
  split.method = 'cutreeHybrid',
  min.group.size = 3)

At this point the call to plot_clusters crashed - my error - and you clearly pointed out where the error was (well done!). I changed my implementation and fixed that with the last commit.

Now all these plots work.

plot_clusters(my_cohort, cutoff_trajectories = 1, cutoff_drivers = 0)
plot_drivers_graph(my_cohort)
plot_dendrogram(my_cohort)
plot_DET_index(my_cohort)
plot_drivers_clonality(my_cohort)
plot_patient_trees(my_cohort, "S5")

Please let me know if you still have errors, or if I can close this issue.

> sessionInfo()
R version 4.0.2 (2020-06-22)
Platform: x86_64-apple-darwin17.0 (64-bit)
Running under: macOS Catalina 10.15.7

Matrix products: default
BLAS:   /System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vecLib.framework/Versions/A/libBLAS.dylib
LAPACK: /Library/Frameworks/R.framework/Versions/4.0/Resources/lib/libRlapack.dylib

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base

TnakaNY commented 3 years ago

Hi,

In R4.0.3, I reinstalled revolver today. It worked well in plot_cluster function. Thank you for prompt reply!!

Best, AT

caravagn commented 3 years ago

Glad to hear that @TnakaNY. Does this fixed the bug also for you @qindan2008? If yes I would close the issue.

qindan2008 commented 3 years ago

yes, I reinstalled revolver, both "plot_clusters" and "plot_trajectories_per_cluster" worked well. Thank you very much !

caravagn commented 3 years ago

Glad I could help, and good luck with your investigations!

caravagnalab / revolver

Error: Can't subset columns that don't exist. May be a bug in complement() function inside get_features() function #34