After running this script i got the output file for somatic cnv like this:
1 18921041 19198941 3 gain AAB 81.846
1 71007971 75160955 3 gain AAB 6.00045
1 78641024 84202811 3 gain AAB 10.0454
1 93278631 99740911 3 gain AAB 32.4437
1 100292566 108136556 3 gain AAB 56.9157
1 185296350 187641778 3 gain AAB 3.25507
1 190097632 198859138 3 gain - -1
1 220025096 220134605 1 loss A -1
1 227016028 227081001 3 gain AAB 68.6832
2 38783 45827 3 gain - -1
2 31963042 32433806 1 loss A 100
2 49017438 51029206 3 gain AAB 28.0357
output_file: again only 7 columns
1 27621445 29279653 1 loss A -1
1 77631847 77926905 1 loss A 47.3996
1 91642882 93079408 1 loss A 100
1 96777544 99709210 3 gain AAB 27.2263
1 173500937 174241659 1 loss A 11.0234
1 219915358 220147732 1 loss A 17.95
2 31531343 32388914 1 loss A -1
2 36378249 36555753 3 gain AAB 53.6888
2 38689775 38882527 1 loss A -1
2 46816758 48733146 1 loss A -1
2 54143223 54850294 3 gain AAB 24.374
2 60756240 62502550 1 loss A -1
2 62502552 69962188 2 neutral AA -1
2 69996591 70297189 1 loss A -1
only 7 columns are displayed i am not getting status column whether its germline or somatic. I want to acquire somatic alterations excluding germline .Additionally , i have overlapped these files with dgv database germline variants in order to remove the germline ones but i am getting good match for almost every region. So i am not sure whether these cnv is somatic or germline .how can i be sure that the cnvs in this file are somatic please suggest how can i acquire that column.
Hi , I want to identify the somatic cnv from tumor-normal paired WES data. The config file i have used is :
First way
[general] BedGraphOutput=TRUE chrLenFile = path/data/hg38.fa.fai window = 0 ploidy = 2 outputDir =path to outputfile/freec-results bedtools = /usr/bin/bedtools samtools = /usr/bin/samtools forceGCcontentNormalization = 0 contaminationAdjustment=TRUE
sex=XY
breakPointType=4 chrFiles = path/genome-file-chr maxThreads=10 gemMappabilityFile = path/FREEC-11.6/out100m2_hg38.gem
noisyData=TRUE printNA=FALSE sambamba = path/bin/./sambamba readCountThreshold=50
[sample] mateFile =path/sample_1T_dedup_reads.bam miniPileup = path/pileup-files/sample_1T.pileup.gz inputFormat = BAM mateOrientation = FR
[control]
mateFile = path/sample_1N_dedup_reads.bam miniPileup =path/sample_1N.pileup.gz inputFormat = BAM mateOrientation = FR
[BAF] makePileup = path/dbSNP151.hg38-commonSNP_minFreq5Perc_with_CHR.bed fastaFile = path/GCA_000001405.15_GRCh38_no_alt_analysis_set.fna SNPfile =path/dbSNP151.hg38-commonSNP_minFreq5Perc_with_CHR.vcf.gz minimalCoveragePerPosition = 5 shiftInQuality = 33
[target]
captureRegions = /EXOME-SEQ/TruSeq_exome_targeted_regions.hg19.bed
captureRegions = path/final-overlapping-removed-sorted.bed
After running this script i got the output file for somatic cnv like this: 1 18921041 19198941 3 gain AAB 81.846 1 71007971 75160955 3 gain AAB 6.00045 1 78641024 84202811 3 gain AAB 10.0454 1 93278631 99740911 3 gain AAB 32.4437 1 100292566 108136556 3 gain AAB 56.9157 1 185296350 187641778 3 gain AAB 3.25507 1 190097632 198859138 3 gain - -1 1 220025096 220134605 1 loss A -1 1 227016028 227081001 3 gain AAB 68.6832 2 38783 45827 3 gain - -1 2 31963042 32433806 1 loss A 100 2 49017438 51029206 3 gain AAB 28.0357
Second way
using config file-only including pileup files:
[general] BedGraphOutput=TRUE chrLenFile = /path/hg38.fa.fai window = 0 ploidy = 2 outputDir = /path/pileup-basedresults bedtools = /usr/bin/bedtools samtools = /usr/bin/samtools forceGCcontentNormalization = 1 contaminationAdjustment=TRUE
sex=XY
breakPointType=4 chrFiles =path/genome-file-chr maxThreads=10 gemMappabilityFile = path/out100m2_hg38.gem
noisyData=TRUE printNA=FALSE sambamba = path/./sambamba readCountThreshold=50
[sample]
mateFile = /path/CaGB_10_T.pileup.gz inputFormat = pileup mateOrientation = FR
[control]
mateFile = path/CaGB_10_N.pileup.gz inputFormat = pileup mateOrientation = FR
[BAF] SNPfile =path/dbSNP151.hg38-commonSNP_minFreq5Perc_with_CHR.vcf.gz minimalCoveragePerPosition = 5 shiftInQuality = 33
[target]
captureRegions = /EXOME-SEQ/TruSeq_exome_targeted_regions.hg19.bed
captureRegions = path/final-overlapping-removed-sorted.bed
output_file: again only 7 columns 1 27621445 29279653 1 loss A -1 1 77631847 77926905 1 loss A 47.3996 1 91642882 93079408 1 loss A 100 1 96777544 99709210 3 gain AAB 27.2263 1 173500937 174241659 1 loss A 11.0234 1 219915358 220147732 1 loss A 17.95 2 31531343 32388914 1 loss A -1 2 36378249 36555753 3 gain AAB 53.6888 2 38689775 38882527 1 loss A -1 2 46816758 48733146 1 loss A -1 2 54143223 54850294 3 gain AAB 24.374 2 60756240 62502550 1 loss A -1 2 62502552 69962188 2 neutral AA -1 2 69996591 70297189 1 loss A -1
only 7 columns are displayed i am not getting status column whether its germline or somatic. I want to acquire somatic alterations excluding germline .Additionally , i have overlapped these files with dgv database germline variants in order to remove the germline ones but i am getting good match for almost every region. So i am not sure whether these cnv is somatic or germline .how can i be sure that the cnvs in this file are somatic please suggest how can i acquire that column.