Illumina / canvas

Canvas - Copy number variant (CNV) calling from DNA sequencing data
Other
121 stars 20 forks source link

filter-bed parameter / option #87

Open BenoitFiset opened 6 years ago

BenoitFiset commented 6 years ago

Hi Eric,

question about --filter-bed for Tumor-normal-enrichment. I was using the filter13.bed file to filter the centromere and all was fine and dandy.. Got results.

I created a bed (0 based as should be) file to filter everything (including the centromeres) that are not exon (so to keep only exon regions). My thought would be that the result CNV files would be smaller than when I only filter the centromeres.

Sample of the bed file: (total of 33491 lines in the whole file)

1   11868   31109
1   34553   36081
1   52472   53312
1   57597   64116
1   65418   71585
1   89294   134836
1   135140  135895
1   137681  137965
1   139789  140339
1   141473  173862
1   182695  184174
1   185216  195411
1   257863  522928
1   586070  859446
1   868070  877234
1   904833  915976
1   916864  921016
1   923927  959309
1   960586  965715
1   966496  982093
1   995965  998051

But no, there are many more CNVs called, lots more.

2465 - lines in the CNV.vcf of filter13.bed - centromere filter file
16944 - lines in the CNV.vcf of the filter not exon regions bed file

Other interesting thing, the EstimatedTumorPurity and OverallPloidy values are much better with the Exoms only region.

filter13.bed - (filter centromere) results:

##EstimatedTumorPurity=0.80
##PurityModelFit=0.0400
##InterModelDistance=0.7024
##LocalSDmetric=5.21
##Heterogeneity=0.00
##EstimatedChromosomeCount=49.86
##OverallPloidy=2.12

Filter not exon regions bed file results:

##EstimatedTumorPurity=0.99
##PurityModelFit=0.0312
##InterModelDistance=0.0293
##LocalSDmetric=3.09
##Heterogeneity=0.00
##EstimatedChromosomeCount=76.14
##OverallPloidy=3.25

Any thought on this, why more CNV when I filter out more regions ? Is my understanding of the --filter-bed option good ? (In the file regions you want to exclude)

Thanks.

eroller commented 6 years ago

Instead of using the filter bed file to exclude regions in your manifest file, can't you just remove them from the manifest file? I don't know if that would make a difference, but the filter bed file is intended to be used globally for all samples and not adjusted per sample.

The reason you end up with a poorer model fit may be because you are limiting the number of regions. More data will give more points for fitting the model and result in a better fit.

You could always filter the regions after the VCF has been produced. Is there a reason you want to filter the regions during CNV calling? As long as you have valid read data in those regions I would use them for CNV calling unless there is a strong reason not to (e.g. runtime constraint, poor read quality in those regions).

BenoitFiset commented 6 years ago

Hi Eric,

to use Canvas for Whole Genome Tumor-normal-enrichment, what would I use as a manifest file as this option seems mandatory ?

If I create my own manifest file with all region... will Canvas break ?

[Header]
Manifest Version    1
ReferenceGenome Homo_sapiens\Ensembl\GRCh38\Sequence\WholeGenomeFASTA

[Regions]
Name    Chromosome  Start   End Upstream Probe Length   Downstream Probe Length
CEX-1-1-248956422   1   1   248956422   0   0
CEX-2-1-242193529   2   1   242193529   0   0
CEX-3-1-198295559   3   1   198295559   0   0
CEX-4-1-190214555   4   1   190214555   0   0
CEX-5-1-181538259   5   1   181538259   0   0
CEX-6-1-170805979   6   1   170805979   0   0
CEX-7-1-159345973   7   1   159345973   0   0
CEX-8-1-145138636   8   1   145138636   0   0
CEX-9-1-138394717   9   1   138394717   0   0
CEX-10-1-133797422  10  1   133797422   0   0
CEX-11-1-135086622  11  1   135086622   0   0
CEX-12-1-133275309  12  1   133275309   0   0
CEX-13-1-114364328  13  1   114364328   0   0
CEX-14-1-107043718  14  1   107043718   0   0
CEX-15-1-101991189  15  1   101991189   0   0
CEX-16-1-90338345   16  1   90338345    0   0
CEX-17-1-83257441   17  1   83257441    0   0
CEX-18-1-80373285   18  1   80373285    0   0
CEX-19-1-58617616   19  1   58617616    0   0
CEX-20-1-64444167   20  1   64444167    0   0
CEX-21-1-46709983   21  1   46709983    0   0
CEX-22-1-50818468   22  1   50818468    0   0
CEX-MT-1-16569  MT  1   16569   0   0
CEX-X-1-156040895   X   1   156040895   0   0
CEX-Y-2781480-56887902  Y   2781480 56887902    0   0

Thanks

eroller commented 6 years ago

You should be able to take the manifest file and adjust the regions to match your sequencing data. Canvas should not break.

BenoitFiset commented 6 years ago

Thanks I'll give it a go.