BoevaLab / FREEC

Control-FREEC: Copy number and genotype annotation in whole genome and whole exome sequencing data
153 stars 49 forks source link

Somatic cnv issue #134

Open awast opened 1 year ago

awast commented 1 year ago

Hi , I want to identify the somatic cnv from tumor-normal paired WES data. The config file i have used is :

First way

[general] BedGraphOutput=TRUE chrLenFile = path/data/hg38.fa.fai window = 0 ploidy = 2 outputDir =path to outputfile/freec-results bedtools = /usr/bin/bedtools samtools = /usr/bin/samtools forceGCcontentNormalization = 0 contaminationAdjustment=TRUE

sex=XY

breakPointType=4 chrFiles = path/genome-file-chr maxThreads=10 gemMappabilityFile = path/FREEC-11.6/out100m2_hg38.gem

noisyData=TRUE printNA=FALSE sambamba = path/bin/./sambamba readCountThreshold=50

[sample] mateFile =path/sample_1T_dedup_reads.bam miniPileup = path/pileup-files/sample_1T.pileup.gz inputFormat = BAM mateOrientation = FR

[control]

mateFile = path/sample_1N_dedup_reads.bam miniPileup =path/sample_1N.pileup.gz inputFormat = BAM mateOrientation = FR

[BAF] makePileup = path/dbSNP151.hg38-commonSNP_minFreq5Perc_with_CHR.bed fastaFile = path/GCA_000001405.15_GRCh38_no_alt_analysis_set.fna SNPfile =path/dbSNP151.hg38-commonSNP_minFreq5Perc_with_CHR.vcf.gz minimalCoveragePerPosition = 5 shiftInQuality = 33

[target]

captureRegions = /EXOME-SEQ/TruSeq_exome_targeted_regions.hg19.bed

captureRegions = path/final-overlapping-removed-sorted.bed

After running this script i got the output file for somatic cnv like this: 1 18921041 19198941 3 gain AAB 81.846 1 71007971 75160955 3 gain AAB 6.00045 1 78641024 84202811 3 gain AAB 10.0454 1 93278631 99740911 3 gain AAB 32.4437 1 100292566 108136556 3 gain AAB 56.9157 1 185296350 187641778 3 gain AAB 3.25507 1 190097632 198859138 3 gain - -1 1 220025096 220134605 1 loss A -1 1 227016028 227081001 3 gain AAB 68.6832 2 38783 45827 3 gain - -1 2 31963042 32433806 1 loss A 100 2 49017438 51029206 3 gain AAB 28.0357

Second way

using config file-only including pileup files:

[general] BedGraphOutput=TRUE chrLenFile = /path/hg38.fa.fai window = 0 ploidy = 2 outputDir = /path/pileup-basedresults bedtools = /usr/bin/bedtools samtools = /usr/bin/samtools forceGCcontentNormalization = 1 contaminationAdjustment=TRUE

sex=XY

breakPointType=4 chrFiles =path/genome-file-chr maxThreads=10 gemMappabilityFile = path/out100m2_hg38.gem

noisyData=TRUE printNA=FALSE sambamba = path/./sambamba readCountThreshold=50

[sample]

mateFile = /path/CaGB_10_T.pileup.gz inputFormat = pileup mateOrientation = FR

[control]

mateFile = path/CaGB_10_N.pileup.gz inputFormat = pileup mateOrientation = FR

[BAF] SNPfile =path/dbSNP151.hg38-commonSNP_minFreq5Perc_with_CHR.vcf.gz minimalCoveragePerPosition = 5 shiftInQuality = 33

[target]

captureRegions = /EXOME-SEQ/TruSeq_exome_targeted_regions.hg19.bed

captureRegions = path/final-overlapping-removed-sorted.bed

output_file: again only 7 columns 1 27621445 29279653 1 loss A -1 1 77631847 77926905 1 loss A 47.3996 1 91642882 93079408 1 loss A 100 1 96777544 99709210 3 gain AAB 27.2263 1 173500937 174241659 1 loss A 11.0234 1 219915358 220147732 1 loss A 17.95 2 31531343 32388914 1 loss A -1 2 36378249 36555753 3 gain AAB 53.6888 2 38689775 38882527 1 loss A -1 2 46816758 48733146 1 loss A -1 2 54143223 54850294 3 gain AAB 24.374 2 60756240 62502550 1 loss A -1 2 62502552 69962188 2 neutral AA -1 2 69996591 70297189 1 loss A -1

only 7 columns are displayed i am not getting status column whether its germline or somatic. I want to acquire somatic alterations excluding germline .Additionally , i have overlapped these files with dgv database germline variants in order to remove the germline ones but i am getting good match for almost every region. So i am not sure whether these cnv is somatic or germline .how can i be sure that the cnvs in this file are somatic please suggest how can i acquire that column.