broadinstitute / ichorCNA

Estimating tumor fraction in cell-free DNA from ultra-low-pass whole genome sequencing.
GNU General Public License v3.0
160 stars 87 forks source link

Removal of chromosomal regions before calling the CNVs/ performing the analysis #83

Open samhitapn opened 3 years ago

samhitapn commented 3 years ago

Hello,

In the ichorCNA paper it is mentioned that "Centromeres are filtered based on chromosome gap coor- dinates obtained from UCSC for hg19, including one 1Mb bin up- and down- stream of the gap.".

My question is do we have to use the mapWig option and provide the mapwig reference files (as provided in the extdata, for eg) for this filtering to take place or are they are also removed when run without mapWig option? How do I check if the regions are indeed not considered while analysis. And what about the telomeric regions? Are they accounted for as well?

Otherwise, what exactly is the function of the mapWig files? What impact does it have?

Thank you

gavinha commented 3 years ago

Hi @samhitapn

For the centromere exclusion, if you provide a centromere gap file in the config.yaml, then it will perform the removal in the analysis. Otherwise, it will not.

https://github.com/broadinstitute/ichorCNA/blob/5bfc03ed854f0e93fe5b624c97c1290fa0053837/scripts/snakemake/config/config.yaml#L19

We provide a centromere gap file for both hg19 and hg38. See https://github.com/broadinstitute/ichorCNA/tree/master/inst/extdata

If you are not using snakemake and running ichorCNA using the main R script directly, then you can specify the arguments: https://github.com/broadinstitute/ichorCNA/blob/5bfc03ed854f0e93fe5b624c97c1290fa0053837/scripts/runIchorCNA.R#L25-L27

You will see the --minMapScore and also in the config.yaml https://github.com/broadinstitute/ichorCNA/blob/5bfc03ed854f0e93fe5b624c97c1290fa0053837/scripts/snakemake/config/config.yaml#L20

This specifies the minimum mappability scoring bins to be included in the analysis and removes low mappability score bins. You can think of this as a global filter and also an additional centromere-specific filter as well. Telomeres are not explicitly removed.

Another purpose of the mappability wig file is that we also perform a genome-wide correction of mappability bias. See HMMcopy for more details.

Hope this helps.

Best, Gavin