abyzovlab / CNVpytor

a python extension of CNVnator -- a tool for CNV analysis from depth-of-coverage by mapped reads
MIT License
186 stars 27 forks source link

multiple normal libraries as input #186

Closed lmanchon closed 12 months ago

lmanchon commented 1 year ago

--Hi,

is CNVpytor can be used with a PON (panel of normal libraries) to compare with a single tumor librarie ?

thank you --

arpanda commented 1 year ago

Hi, Can you provide more details about your query? Specifically, what type of comparison are you looking to make using the PON?

Thank you, Arijit

lmanchon commented 1 year ago

--Hi,

i try to find CNV on a tumor library and i have 11 normal libraries to make a PON as it usual to do (like PureCN ,CNVKit or GATK CNV caller requirements). Is CNVpytor use only 2 librairies (tumor vs normal), or is it possible to specify a PON as parameter ?

thank you --

arpanda commented 1 year ago

To clarify, your goal is to identify and exclude the Copy Number Variations (CNVs) that have already been detected in your normal sample/s. This can be achieved through few approaches.

One approach is to process individual samples and then look for the CNV calls in the tumor compared to normal. Please have a look at this example: https://github.com/abyzovlab/CNVpytor/blob/master/examples/merging.md, if it helps.

Thank you Arijit

lmanchon commented 1 year ago

Okay, i understand now, what i have to do is to compare each normal librarie against my tumor librarie and then merge the results. I work on target exome sequencing with 32000 probes, my sequencing depth is 1000x. Do you have some clues to adjust some parameters to get best results ? Thank you --

arpanda commented 1 year ago

Great. For exome data, you can use the vcf input to call CNV. Please have a look at the snp2rd command and follow the steps, it utilizes variant's read depth information. I would suggest to use a very high bin size for this purpose. Maybe 100 000, 500000 or 1000000.

lmanchon commented 1 year ago

i have only bam files and corresponding vcf files computed by Mutect2. isn't it risky to use big bins? --> Small CNVs may not be detected

arpanda commented 1 year ago

I would recommend using germline variants for the variants input. You have targeted exome right? its depends on the proportion of genomic regions it covers.

lmanchon commented 1 year ago

how to have germline variants ? % proportion covered: 32800 probes of 300 base pairs spread across the genome (hg38)