RCollins13 / CNView

Visualization and annotation of CNVs from population-scale whole-genome sequencing data
MIT License
69 stars 11 forks source link

Sample names indexing #6

Closed SteveSemick closed 6 years ago

SteveSemick commented 6 years ago

Hi @RCollins13 ,

I recently ran into an error as follows:

Sample ID file 'Sample01' not found, assuming single sample ID provided
Filtering & loading coverage matrix... Complete
Error in .subset(x, j) : invalid subscript type 'list'
Calls: CNView -> [ -> [.data.frame
Execution halted

I tracked the error to lines 130-134 of the current CNView.R code

  ##Drop Columns to Specified Sample Size##
  cov <- cov[,unique(c(1:3,
                       as.vector(sapply(head(unique(c(sampleID,sample(names(cov[,-c(1:3)])))),n=subsample),
                                        function(val){grep(val,colnames(cov),ignore.case=T)}))))]

I noticed that the current method of subsetting uses grep which was problematic for my sample names because my coverage bed including sample names like Sample11, Sample 111, Sample112, etc., and grep matched Sample11 to Sample11, Sample111, Sample112, etc.

I changed the code in those lines to the following:

  cov <- cov[,unique(c(1:3,
                       as.vector(sapply(head(unique(c(sampleID,sample(names(cov[,-c(1:3)])))),n=subsample),function(val){which(val==colnames(cov))}))))]

using which() instead of grep() to index, and this corrected the inappropriate multimatching of sample names, resolving the problem. You may want to consider changing the way you index samples (from sample name) in lines 130-134 of CNView.R

Best, Steve

RCollins13 commented 6 years ago

Hi @SteveSemick,

Thanks for the bug report and proposed solution. I've adopted your changes and pushed them to the master branch.

Best, Ryan