BoevaLab / FREEC

Control-FREEC: Copy number and genotype annotation in whole genome and whole exome sequencing data
146 stars 49 forks source link

empty pileup file produced #31

Closed likelet closed 6 years ago

likelet commented 6 years ago

I am not sure wether this is a bug of freec. My input files are two sorted and base recalibration processed bamfiles, as control file and sample file. The config file is

[general] chrLenFile = /HOME/sysu_rj_1/CLS/database/hg19/freecLib/genome.fa.fai window = 0 ploidy = 2 outputDir = ./ sex=XX breakPointType=4 chrFiles = /HOME/sysu_rj_1/CLS/database/hg19/freecLib/chromosomes bedtools = /HOME/sysu_rj_1/CLS/software/bedtools2/bin/bedtools sambamba = ~/bin/sambamba SambambaThreads = 23 samtools = samtools maxThreads=23 breakPointThreshold=1.2 noisyData=TRUE printNA=FALSE readCountThreshold=50 [sample] mateFile = '311252-S_sort_dedup_realigned_recal.bam' inputFormat = BAM mateOrientation = 0 [control] mateFile = '311252-N-1_sort_dedup_realigned_recal.bam' inputFormat = BAM mateOrientation = 0 [BAF] makePileup = /HOME/sysu_rj_1/CLS/database/hg19/freecLib/hg19_snp142.SingleDiNucl.1based.bed fastaFile = /HOME/sysu_rj_1/CLS/database/hg19/freecLib/genome.fa SNPfile = /HOME/sysu_rj_1/CLS/database/hg19/freecLib/hg19_snp142.SingleDiNucl.1based.txt minimalCoveragePerPosition = 5 [target] captureRegions = /HOME/sysu_rj_1/CLS/database/hg19/freecLib/freec_nuohe_target_V6.bed

and my samtools version is 1.3.1 any suggestions?

valeu commented 6 years ago

Could you also share the output into the command line?

likelet commented 6 years ago

log.txt it was stuck at this file size and no output log further

valeu commented 6 years ago

What log is this? It says 'Unable to open file ./311252-S_sort_dedup_realigned_recal.bam_NewCaptureRegions.bed. Exiting.'

likelet commented 6 years ago

Here is the folder tree, I have also no idea what it is . . ├── 311252-N-1_sort_dedup_realigned_recal.bam -> ../../run_gatk_sh_dir/311252-N-1_sort_dedup_realigned_recal.bam ├── 311252-N-1_sort_dedup_realigned_recal.bam_minipileup.pileup ├── 311252-S_freec_config.txt ├── 311252-S_freec.log ├── 311252-S_sort_dedup_realigned_recal.bam -> ../../run_gatk_sh_dir/311252-S_sort_dedup_realigned_recal.bam ├── 311252-S_sort_dedup_realigned_recal.bam_minipileup.pileup ├── 311252-S_sort_dedup_realigned_recal.bam_SNPinNewCaptureRegions.bed ├── run_freec_paired.sh └── slurm-4032059.out I also recorded the log file

_Control-FREEC v11.0 : a method for automatic detection of copy number alterations, subclones and for accurate estimation of contamination and main ploidy using deep-sequencing data Multi-threading mode using 23 threads ..consider the sample being female ..Breakpoint threshold for segmentation of copy number profiles is 1.2 ..telocenromeric set to 50000 ..FREEC is not going to output normalized copy number profiles into a BedGraph file (for example, for visualization in the UCSC GB). Use "[general] BedGraphOutput=TRUE" if you want a BedGraph file ..FREEC is not going to adjust profiles for a possible contamination by normal cells ..Window = 0 was set ..Output directory: ./ ..Directory with files containing chromosome sequences: /HOME/sysu_rj_1/CLS/database/hg19/freecLib/chromosomes ..will use a threshold of 5 read(s) per SNP position to calculate beta allel frequency (BAF) values ..Sample file: '311252-S_sort_dedup_realigned_recal.bam' ..Sample input format: BAM ..will use this instance of sambamba: '~/bin/sambamba' to read BAM files ..Control file: '311252-N-1_sort_dedup_realigned_recal.bam' ..Input format for the control file: BAM FREEC will create a pileup to compute BAF profile! ...File with SNPs : /HOME/sysu_rj_1/CLS/database/hg19/freecLib/hg19_snp142.SingleDiNucl.1based.bed ..forceGCcontentNormalization was set to 1: will use GC-content to normalize the read count data ..minimal expected GC-content (general parameter "minExpectedGC") was set to 0.35 ..maximal expected GC-content (general parameter "maxExpectedGC") was set to 0.55 ..Minimal CNA length (in windows) is 3 ..File with chromosome lengths: /HOME/sysu_rj_1/CLS/database/hg19/freecLib/genome.fa.fai ..File /HOME/sysu_rj1/CLS/database/hg19/freecLib/genome.fa.fai was read ..Using the default minimal mappability value of 0.85 ..uniqueMatch = FALSE ..average ploidy set to 2 ..break-point type set to 4 ..noisyData set to 1 ..minimal number of reads per window in the control sample is set to 50 ..Control-FREEC will not look for subclones Creating Pileup file to compute BAF profile... ..If you have got an error at this step and a mini-pileup file is empty, check that you are using samtools v1.1 or later and provide a corresponding path in your config file

likelet commented 6 years ago

sorry for the confusing log file from slurm system. The freec log file was recorded in another file and I pasted above.

likelet commented 6 years ago

I figure it out, this error was triggered by the quote line of my input file mateFile = '311252-N-1_sort_dedup_realigned_recal.bam' I remove the "'" and it seems worked

keryruo commented 5 years ago

I encountered this problem when using tumor-control paired bam as input file for detecting CNV+LOH. When I turned to use pileup as input for CNV+LOH detection , it can successfully process and give the proper output. My input bam files are sorted and base recalibration , so it certainly not caused by the inproper input file.

The config file is

[general] BedGraphOutput=TRUE bedtools=/mnt/tools/bedtools2/bedtools2/bin/bedtools breakPointThreshold=1.2 breakPointType=4 chrFiles=~/CRC/prepData/chrFiles/ chrLenFile=~/CRC/prepData/hg19.len coefficientOfVariation=0.05 degree=1 forceGCcontentNormalization=1 minCNAlength=3 minMappabilityPerWindow=0.85 minimalSubclonePresence=0.3 maxThreads=6 noisyData=TRUE
outputDir=~/CRC/GATK4/somatic/cnv/controlfreec/0515T_test ploidy=2 printNA= FALSE
readCountThreshold=50
sambamba=~/tool/sambamba/bin/sambamba
sambambaThreads=8 samtools=/mnt/tools/samtools1.35/samtools sex= XX window=0

[sample] mateFile=/home/data/bulk_bam/0515Tumor.co-realn.bam
inputFormat=BAM mateOrientation=0

[control] mateFile=/home/data/bulk_bam/0515Normal.co-realn.bam
inputFormat=BAM mateOrientation=0

[BAF] makePileup=~/CRC/prepData/hg19_snp142.SingleDiNucl.1based.bed fastaFile=/mnt/ref/GRCh37/GRCh37.p13.genome_main.fa SNPfile=~/CRC/prepData/hg19_snp142.SingleDiNucl.1based.txt minimalCoveragePerPosition=5 minimalQualityPerPosition=5 shiftInQuality=33

[target] captureRegions=~/CRC/prepData/exome_V4_region.bed

the running log is

*Control-FREEC v11.5 : a method for automatic detection of copy number alterations, subclones and for accurate estimation of contamination and main ploidy using deep-sequencing data Multi-threading mode using 6 threads ..consider the sample being female ..Breakpoint threshold for segmentation of copy number profiles is 1.2 ..telocenromeric set to 50000 ..FREEC is not going to adjust profiles for a possible contamination by normal cells Warning: the number of thread to use with Sambamba (option "SambambaThreads" in [general] has been set to 6 ..in the config file, you can set SambambaThreads = 2 to use 2 threads..Note, the Coefficient Of Variation won't be used since "window" = 0 was set ..Output directory: ~/CRC/GATK4/somatic/cnv/controlfreec/0515T_test ..Directory with files containing chromosome sequences: /~/CRC/prepData/chrFiles/ ..will use a threshold of 5 read(s) per SNP position to calculate beta allel frequency (BAF) values ..will use a quality threshold of 5 to select nucleotides used in calculation of beta allel frequency (BAF) values ..will shift qualities by 33 when selecting nucleotides used in calculation of beta allel frequency (BAF) values ..Note, use shiftInQuality=33 for Sanger or Illumina 1.8+ format; shiftInQuality=64 for Illumina 1.3+ ..Sample file: /home/data/bulk_bam/0515Tumor.co-realn.bam ..Sample input format: BAM ..will use this instance of sambamba: '~/tool/sambamba/bin/sambamba' to read BAM files ..Control file: /home/data/bulk_bam/0515Normal.co-realn.bam ..Input format for the control file: BAM FREEC will create a pileup to compute BAF profile! ...File with SNPs : ~/CRC/prepData/hg19_snp142.SingleDiNucl.1based.bed ..forceGCcontentNormalization was set to 1: will use GC-content to normalize the read count data ..minimal expected GC-content (general parameter "minExpectedGC") was set to 0.35 ..maximal expected GC-content (general parameter "maxExpectedGC") was set to 0.55 ..Polynomial degree for "ReadCount ~ GC-content" is 1 Warning: minimal recommended polynomial degree for "ReadCount ~ GC-content" is 3 Comment or remove the corresponding line in the config file to try both degree==3 and degree==4 ..Minimal CNA length (in windows) is 3 ..File with chromosome lengths: ~/CRC/prepData/hg19.len ..File ~/CRC/prepData/hg19.len was read ..Using the minimal mappability of: 0.85 ..uniqueMatch = FALSE ..average ploidy set to 2 ..break-point type set to 4 ..noisyData set to 1 ..minimal number of reads per window in the control sample is set to 50 ..Control-FREEC will look for subclones present in at least 30% of cell population Creating Pileup file to compute BAF profile... samtools mpileup options: -f /mnt/ref/GRCh37/GRCh37.p13.genome_main.fa -d 8000 -Q 38 1 -l ~/CRC/GATK4/somatic/cnv/controlfreec/0515T_test/0515Tumor.co-realn.bam_SNPinNewCaptureRegions.bed [executing] /usr/local/bin/samtools mpileup /tmp/sambamba-pid22858-kqvb/1 -l /tmp/sambamba-pid22858-kqvb/1.bed -f /mnt/ref/GRCh37/GRCh37.p13.genome_main.fa -d 8000 -Q 38 1 -l ~/CRC/GATK4/somatic/cnv/controlfreec/0515T_test/0515Tumor.co-realn.bam_SNPinNewCaptureRegions.bed | ~/tool/sambamba/bin/sambamba lz4compress

sambamba 0.6.8 by Artem Tarasov and Pjotr Prins (C) 2012-2018 LDC 1.12.0 / DMD v2.082.1 / LLVM7.0.0 / bootstrap LDC - the LLVM D compiler (1.12.0)

[executing] /usr/local/bin/samtools mpileup /tmp/sambamba-pid22858-kqvb/2 -l /tmp/sambamba-pid22858-kqvb/2.bed -f /mnt/ref/GRCh37/GRCh37.p13.genome_main.fa -d 8000 -Q 38 1 -l ~/CRC/GATK4/somatic/cnv/controlfreec/0515T_test/0515Tumor.co-realn.bam_SNPinNewCaptureRegions.bed| ~/tool/sambamba/bin/sambamba strip_bcf_header --vcf| ~/tool/sambamba/bin/sambamba lz4compress

sambamba 0.6.8 by Artem Tarasov and Pjotr Prins (C) 2012-2018 LDC 1.12.0 / DMD v2.082.1 / LLVM7.0.0 / bootstrap LDC - the LLVM D compiler (1.12.0)

[executing] /usr/local/bin/samtools mpileup /tmp/sambamba-pid22858-kqvb/3 -l /tmp/sambamba-pid22858-kqvb/3.bed -f /mnt/ref/GRCh37/GRCh37.p13.genome_main.fa -d 8000 -Q 38 1 -l ~/CRC/GATK4/somatic/cnv/controlfreec/0515T_test/0515Tumor.co-realn.bam_SNPinNewCaptureRegions.bed| ~/tool/sambamba/bin/sambamba strip_bcf_header --vcf| ~/tool/sambamba/bin/sambamba lz4compress

sambamba 0.6.8 by Artem Tarasov and Pjotr Prins (C) 2012-2018 LDC 1.12.0 / DMD v2.082.1 / LLVM7.0.0 / bootstrap LDC - the LLVM D compiler (1.12.0)

[executing] /usr/local/bin/samtools mpileup /tmp/sambamba-pid22858-kqvb/4 -l /tmp/sambamba-pid22858-kqvb/4.bed -f /mnt/ref/GRCh37/GRCh37.p13.genome_main.fa -d 8000 -Q 38 1 -l ~/CRC/GATK4/somatic/cnv/controlfreec/0515T_test/0515Tumor.co-realn.bam_SNPinNewCaptureRegions.bed| ~/tool/sambamba/bin/sambamba strip_bcf_header --vcf| ~/tool/sambamba/bin/sambamba lz4compress

sambamba 0.6.8 by Artem Tarasov and Pjotr Prins (C) 2012-2018 LDC 1.12.0 / DMD v2.082.1 / LLVM7.0.0 / bootstrap LDC - the LLVM D compiler (1.12.0)

[opened FIFO for writing] /tmp/sambamba-pid22858-kqvb/1 [opened FIFO for writing] /tmp/sambamba-pid22858-kqvb/2 [E::hts_open_format] fail to open file '1' [mpileup] failed to open 1: No such file or directory [E::hts_open_format] fail to open file '1' [mpileup] failed to open 1: No such file or directory [opened FIFO for writing] /tmp/sambamba-pid22858-kqvb/3 [E::hts_open_format] fail to open file '1' [mpileup] failed to open 1: No such file or directory [executing] /usr/local/bin/samtools mpileup /tmp/sambamba-pid22858-kqvb/5 -l /tmp/sambamba-pid22858-kqvb/5.bed -f /mnt/ref/GRCh37/GRCh37.p13.genome_main.fa -d 8000 -Q 38 1 -l ~/CRC/GATK4/somatic/cnv/controlfreec/0515T_test/0515Tumor.co-realn.bam_SNPinNewCaptureRegions.bed| ~/tool/sambamba/bin/sambamba strip_bcf_header --vcf| ~/tool/sambamba/bin/sambamba lz4compress [opened FIFO for writing] /tmp/sambamba-pid22858-kqvb/4 [E::hts_open_format] fail to open file '1' [mpileup] failed to open 1: No such file or directory*