dariober / cnv_facets

Somatic copy variant caller (CNV) for next generation sequencing
Other
67 stars 15 forks source link

How to generate Gistic2.0 input data from cnv_facets output #52

Open LileeGao opened 1 year ago

LileeGao commented 1 year ago

I get the vcf file form cnv_facets I use the WGS data I wonder How can I get the segment data from the vcf data especially the column of number of probe

dariober commented 1 year ago

Hi- Can you clarify your question perhaps adding an example of the data and what you want from it? In general, to parse vcf files you can use bcftools.

LileeGao commented 1 year ago

Thanks for your response. I will add a example. My raw data is from WGS and I run bwa for mapping and picard for marking duplicate. Then I used cnv_facets to call CNV I get the VCF data from cnv_facets the VCF file look like:

CHROM POS ID REF ALT QUAL FILTER INFO

chr1 13111 1 N . PASS SVTYPE=DUP;SVLEN=126819;END=139929;NUM_MARK=223;NHET=4;CNLR_MEDIAN=0.056;MAF_R=-0.098;SEGCLUST=42;CNLR_MEDIAN_CLUST=0.031;MAF_R_CLUST=.;CF_EM=0.342;TCN_EM=3;LCN_EM=.;CNV_ANN=. chr1 158007 2 N . PASS SVTYPE=DUP;SVLEN=3774146;END=3932152;NUM_MARK=8926;NHET=1569;CNLR_MEDIAN=0.184;MAF_R=-0.021;SEGCLUST=50;CNLR_MEDIAN_CLUST=0.155;MAF_R_CLUST=-0.005;CF_EM=0.295;TCN_EM=4;LCN_EM=2;CNV_ANN=. chr1 3932496 3 N . PASS SVTYPE=DUP;SVLEN=1524406;END=5456901;NUM_MARK=4385;NHET=1059;CNLR_MEDIAN=0.088;MAF_R=-0.015;SEGCLUST=43;CNLR_MEDIAN_CLUST=0.102;MAF_R_CLUST=-0.006;CF_EM=0.248;TCN_EM=4;LCN_EM=2;CNV_ANN=.

Then I want to get segment data for Gistic2.0, I notice Gistic2.0 need a segmentationfile.txt file like this : image

I learn the column are "sample", "chromosome" "start Position", "end Position", "number of markers in segment", "Seg.CN" I want to kown how to get the "number of markers in segment" and "Seg.CN" form the VCF files (which is generated by the cnv_facets)

I really appreciate your answer!

jamelee commented 1 year ago

Hi,

You can use the value of 'NUM_MARK' for 'number of markers in segment', and 'CNLR.MEDIAN - dipLogR' for 'Seg.CN'.

More details at https://github.com/mskcc/facets/issues/84.

satishbioinfo commented 3 months ago

Hello @jamelee , I don't see a dipLogR field in the VCF file. so should we use 'CNLR.MEDIAN' as 'Seg.CN' for gistic input.