Combining reference - Githubissues

etal / cnvkit

Copy number variant detection from targeted DNA sequencing

Other

520 stars 163 forks source link

Hi!

I just wanted to check a few things and get some suggestions on how we do CNV analyses. Currently, we are using CNVKit from SevenBridges. I hope that I can get some better understanding of the tool, based on the replies.

The scenario: We have around 40 normal bams from bunch of breeds, with some being FFPE samples (~16) and some being Flash Frozen (~24), with each bam being nearly 200Gb.

Unfortunately, due to memory and storage constraints on our instances, I have chosen to run CNVkit on the Flash frozen data sets by creating a Panel of Normals (created with the Flash frozen datasets).

Some breeds have samples only with FFPE, and no dataset for Flash frozen. We also have a few samples with Matched Tumor-Normal pairs.

Questions:

Can I run CNVkit on the FFPE normals to create a reference Panel of Normals and merge it with the Flash Frozen Panel of Normals? Is there a way to go about it?
Are there any caveats in identifying the CNVs for FFPE samples using Flash Frozen Panel of Normals?
Would downsampling the BAMs be useful or recommended to capture the diversity in the Panel of Normals?
Since we have matched Tumor-Normal pairs, and I did try checking a few of them for comparison, I saw the the pair produced a large number of regions having more events than with the Panel of Normals. What approach would be better?

Hope that these are not too trivial questions! Any help would be really useful!

Regards, Harish

Hi @harish0201,

No, your questions are not trivial at all ! If I understand your situation correctly:

A way to achieve that would be to generate every sample.{targets,antitargets}coverage.cnn files, for both your fresh and FFPE samples => Example running batch -n *fresh_normal.bam then batch -n FFPE_normal.bam (removing each set of BAM from your main instance in-between, to have enough storage) => Then simply run reference *coverage.cnn -f ucsc.hg19.fa -o FFPE-and-fresh_reference.cnn
But you should be aware that CNVkit documentation says it is better to have a PoN matching sample type of your "tumor" input BAM => I cannot tell if someone knows how such "mixed type" PoN would behave (but you are welcome to test and tell us) => Plus I think you have enough samples of each type to create 2 separate PoN (one for fresh-frozen samples and the other for FFPE ones)
What do you want to downsample ? Input "tumor" BAM of "normal" BAM prior to creating your PoN ? => Either way I do not think this a good idea !
CNVkit documentation says it will perform better with a PoN than with matched tumor-normal

Regarding your storage limitations, I would add :

Your "normal" BAM files are only needed during reference creation (more precisely when calculating coverage), a step that you just need to perform once
Then you just have to simply give your above-generated reference.cnn file (representing your PoN) as input to batch such as: batch *fresh-Tumor.bam --reference fresh_reference.cnn (no need to specify "targets" or "antitargets", they are deduced from your *reference.cnn)
And adapt reference.cnn given depending of input "tumor" sample type

Hope this helps ! Have a nice day, Felix.

etal / cnvkit

Combining reference #733