dariober / cnv_facets

Somatic copy variant caller (CNV) for next generation sequencing
Other
67 stars 15 forks source link

facets fails when first row of snp pileup is not "chr1" #12

Closed stevekm closed 5 years ago

stevekm commented 5 years ago

When trying to use facets with a snp pileup file that looks like this:

Chromosome,Position,Ref,Alt,File1R,File1A,File1E,File1D,File2R,File2A,File2E,File2D
chr7,55242300,.,.,44,0,0,0,42,0,0,0
chr7,55242400,.,.,215,0,0,0,208,0,0,0
chr7,55242500,.,.,263,0,0,0,259,0,0,0
chr7,55242582,G,A,93,0,0,0,113,0,0,0

The package fails at this step:

library(facets)
rcmat = readSnpMatrix("Sample1.snp_pileup.gz") #read in the matrix
preproc_sample = preProcSample(rcmat)

Error message:

Loading required package: pctGCdata
Error in 1:nchr : result would be too long a vector
Calls: preProcSample -> counts2logROR
In addition: Warning message:
In max(mat$chrom) : no non-missing arguments to max; returning -Inf

Because the object rcmat looks like this after loading:

> rcmat
   Chromosome  Position NOR.DP NOR.RD TUM.DP TUM.RD
1        chr7  55242300     44     44     42     42
2        chr7  55242400    215    215    208    208
3        chr7  55242500    263    263    259    259
4        chr7  55242582     93     93    113    113
5        chr7  55242600     74     74     91     91

This is wrong, because the function readSnpMatrix is supposed to strip out the "chr" strings from the "Chromosome" column entries. This causes the error later in this line of code at counts2logROR, called by preProcSample: https://github.com/mskcc/facets/blob/d5638d76574fa3a4c0b38cbbd55c1404bd4db663/R/facets-procreads.R#L52

counts2logROR <- function(mat, gbuild, unmatched=FALSE, ugcpct=NULL, f=0.2) {
...
...
    nchr <- max(mat$chrom) # IMPACT doesn't have X so only 22
for (i in 1:nchr) {
...
...

This error occurs because cnv_facets is using an outdated version of facets. The version of facets being used in cnv_facets is from facets release 0.5.14 commit 434b5ce.

The latest GitHub commits of facets include the corrected code: https://github.com/mskcc/facets/commit/9591c451670756d13211c1c8ed7cd1b9563e9944#diff-e36f895624961ce4065f273457a7f140

However, the facets team has not yet made a new release with this fixed code. Without this bug fix, cnv_facets will fail any time the snp pileup has "chr" values in the first column and the first row does not start with "chr1".

dariober commented 5 years ago

Hi- I'm confused... cnv_facets doesn't use the function readSnpMatrix at all. I've replaced in readSnpMatrix2 - if I remember correctly one reason for doing it is exactly the problem with chr1. Besides, it seems you are using the facets package directly in R rather than the wrapper cnv_facets?

Using the test data in this repository, the following works:

# Get sample of data, only header and chr7
zcat test/data/stomach_chr.csv.gz | grep -P 'Chr|chr7,' > sample.csv

# Run cnv_facets:
cnv_facets.R -p sample.csv -o tmp

[2019-07-18 21:54:44] Loading file sample.csv...
[2019-07-18 21:54:44] Plotting histogram of coverage...
[2019-07-18 21:54:46] Preprocessing sample...
[2019-07-18 21:54:49] Processing sample...
[2019-07-18 21:54:49] Fitting model...
[2019-07-18 21:54:49] Writing output
[2019-07-18 21:54:49] Plotting genome...
[2019-07-18 21:54:49] Plotting spider...
R version 3.5.1 (2018-07-02)
Platform: x86_64-conda_cos6-linux-gnu (64-bit)
Running under: Ubuntu 18.04.2 LTS
...

If you can share a sample of data causing the problem I can see if I can reproduce the issue.

stevekm commented 5 years ago

Thanks, looks like this might have been a case of reading two manuals at once, will try readSnpMatrix2 instead.