broadinstitute / ichorCNA

Estimating tumor fraction in cell-free DNA from ultra-low-pass whole genome sequencing.
GNU General Public License v3.0
164 stars 88 forks source link

HMMcopy output format/version? #2

Closed chrisprobert closed 7 years ago

chrisprobert commented 7 years ago

Hi there -- I'm hitting a problem loading files produced by HMMcopy readCounter with ichor.

It seems like ichor is looking for a wiggle file with 3 columns (reads, gc, map), but the version of HMMcopy readCounter I'm using is only producing a wiggle file with one column (presumably reads).

The readCounter I'm using is the one distributed in HMMcopy bin/readCounter with HMMcopy v0.1.1 from http://compbio-bccrc.sites.olt.ubc.ca/files/2013/12/HMMcopy.zip. I tried using both the pre-compiled distributed binary, as well as recompiling readCounter from source, but both produce the same output.

I'm running readCounter as described in the ichor docs (though with different chrom names to match the assembly I'm using):

bin/readCounter --window 1000000 --quality 20 \
--chromosome "chr1,chr2,chr3,chr4,chr5,chr6,chr7,chr8,chr9,chr10,chr11,chr12,chr13,chr14,chr15,chr16,chr17,chr18,chr19,chr20,chr21,chr22,chrX" \
/path/to/my.bam > /path/to/my.wig

And this is the error I see when running ichor with the wiggle file from readCounter:

[...]
Correcting Tumour
Loading required package: GenomeInfoDb
Removed 0 bins near centromeres.
Error in correctReadCounts(counts, chrNormalize = chrNormalize) :
  Missing one of required columns: reads, gc, map
Calls: loadReadCountsFromWig -> correctReadCounts
Execution halted

(looks like it's being raised here).

I've attached the wiggle file output from readCounter here, and also copied the first few lines below:

fixedStep chrom=chr1 start=1 step=1000000 span=1000000
26283
66885
77638
71066
94216

I'm wondering if perhaps I'm using the wrong version of HMMcopy?

gavinha commented 7 years ago

Hi Chris,

The wig file from readCounter looks fine.
It appears the error is occurring within loadReadCountsFromWig. If you are using scripts/runIchorCNA.R, then it will be at these lines:

https://github.com/broadinstitute/ichorCNA/blob/05c7f1f5849d04ffc4199a8a5cbb1f85c644c9c4/scripts/runIchorCNA.R#L168-L171

Notice that it requires a GC and mappability file that is compatible with same genome build as your data. We have included these gc and map wig files for hg19 (in NCBI chromosome naming convention, e.g. 1 instead of chr1). We have also just included these files for hg38 (in UCSC chromosome naming convention, e.g. chr1 instead of 1) as part of Issue #1 . Perhaps this is the problem?

https://github.com/broadinstitute/ichorCNA/blob/05c7f1f5849d04ffc4199a8a5cbb1f85c644c9c4/scripts/runIchorCNA.R#L152-L162

These will be automatically loaded unless you specify in the command line using --gcWig and --mapWig to point to your own files.

If you are using the snakemake workflow, you'll need to specify this directly in the config.yaml file: https://github.com/broadinstitute/ichorCNA/blob/05c7f1f5849d04ffc4199a8a5cbb1f85c644c9c4/scripts/snakemake/config/config.yaml#L10-L11

chrisprobert commented 7 years ago

Thanks Gavin, this really helped.

For any other folks hitting this error, one quick fix is just to re-name the chroms in the wiggle file from UCSC --> NCBI (i.e. drop the chr from chrom=chrN), which allowed me to use the provided reference data files without issue.