Segmentation results missing expected gene

aleighbrown commented 5 years ago

Just wondering if you can shed some light on the segmentation results.

I know this sample has an amplification of EGFR, chr7:55086714-55324313 However when I examine the .segs.txt file,

chr7 | 15726066 | 50175681 | 164 chr7 | 51096036 | 53103673 | 4 chr7 | 56059591 | 56174140 | 6

The whole gene seems to have been excluded.

Where do the start and stop come from for these segs?

Why are certain parts of the genome excluded?

gavinha commented 5 years ago

Hi @aleighgreen

Thanks for your question. The start and stop in the segment files are the SNPs used in the analysis. There are several stages in which some data may be excluded or removed. 1) Germline heterozygous SNPs are absent in this region. Double check that there are some SNPs in this region by looking at the intermediate results here: results/titan/hetPosns/. 2) SNPs may be filtered based on: a) Extremely high-read depth. SNPs with read depth greater than 1000 are excluded. This can be a problem for you. You can change this setting for titanCNA.R by setting the argument --maxDepth. 3) Read coverage bin was excluded in ichorCNA pre-processing analysis. Double check that the bins corresponding to this region is present in results/ichorCNA/

Let me know if any of this is helpful.

Best, Gavin

ysbioinfo commented 5 years ago

Hi Gavin, I also want to ask about this, do you think it's necessary to include more SNPs (not only from HAPMAP) to avoid loss of some genomic regions? Will this make the result of TitanCNA better? Thanks Yang

aleighbrown commented 5 years ago

Increasing the maxDepth argument worked in this case, I'll close issue after snoopy get's a reply :)

gavinha commented 5 years ago

@snoopy-448

I also want to ask about this, do you think it's necessary to include more SNPs (not only from HAPMAP) to avoid loss of some genomic regions? Will this make the result of TitanCNA better?

TL;DR: You can try including more SNPs if you think it is necessary for what you want to achieve.

This is a good question. TITAN originally didn't filter any SNPs. It just took all the SNPs from the matched normal for the analysis. However, this can sometimes lead to inclusion of SNPs that are just germline homozygous. I introduced a filter to only keep those that are also present in a database, like HapMap. This didn't remove as many SNPs as I had expected so I kept it. The HapMap SNPs help to provide a clean set of SNPs for TitanCNA analysis and this trade-off has been worth it because the TitanCNA results are also cleaner.

If you are concerned about missing regions, I would recommend going back to the ichorCNA intermediate results. In fact, for another pipeline, I have been combining ichorCNA and TitanCNA results for this exact reason - as well as for chrX in males (since TitanCNA excludes chrX). This script https://github.com/gavinha/TitanCNA_10X_snakemake/blob/master/code/combineTITAN-ichor.R will fill in the bin-level output but, unfortunately, not the segs.txt file. The filled-in bin-level output will contain columns indicated Corrected_Copy_Number so it can still help you determine very focal regions down to an individual bin of 10kb.

ysbioinfo commented 5 years ago

Got it! Great help to me. Thank you!

gavinha / TitanCNA

Segmentation results missing expected gene #50