BoevaLab / FREEC

Control-FREEC: Copy number and genotype annotation in whole genome and whole exome sequencing data
148 stars 49 forks source link

Control-FREEC v11.6 failed #96

Open lmanchon opened 2 years ago

lmanchon commented 2 years ago

--Hello,

i have running Control-FREEC on a single chromosome with those parameters: freec -conf config_WGS.txt

############# config_WGS.txt ################### [general]

parameters chrLenFile and ploidy are required.

chrLenFile = res_chr20.fa ploidy = 2

Parameter "breakPointThreshold" specifies the maximal slope of the slope of residual sum of squares.

This should be a positive value. The closer it is to Zero, the more breakpoints will be called. Its recommended value is between 0.01 and 0.08.

breakPointThreshold = 0.6

Either coefficientOfVariation or window must be specified for whole genome sequencing data. Set window=0 for exome sequencing data.

coefficientOfVariation = 0.06

window = 5000

step=10000

Either chrFiles or GCcontentProfile must be specified too if no control dataset is available.

If you provide a path to chromosome files, Control-FREEC will look for the following fasta files in your directory (in this order):

1, 1.fa, 1.fasta, chr1.fa, chr1.fasta; 2, 2.fa, etc.

Please ensure that you don't have other files but sequences having the listed names in this directory.

chrFiles = CONTROL_FREEC/

GCcontentProfile = test/GC_profile_50kb.cnp

if you are working with something non-human, we may need to modify these parameters:

minExpectedGC = 0.35

maxExpectedGC = 0.55

readCountThreshold=10

maxThreads = 10 numberOfProcesses = 4

outputDir = test

contaminationAdjustment = TRUE contamination = 0

minMappabilityPerWindow = 0.95

If the parameter gemMappabilityFile is not specified, then the fraction of non-N nucleotides per window is used as Mappability.

gemMappabilityFile = /GEM_mappability/out76.gem

breakPointType = 2 forceGCcontentNormalization = 1 sex=XY

set BedGraphOutput=TRUE if you want to create a BedGraph track for visualization in the UCSC genome browser:

BedGraphOutput=TRUE

[sample]

mateFile = chr20.bam

mateCopyNumberFile = test/sample.cpn

inputFormat = BAM mateOrientation = RF

use "mateOrientation=0" for sorted .SAM and .BAM

[control]

mateFile = /path/control.pileup.gz

mateCopyNumberFile = path/control.cpn

inputFormat = pileup

mateOrientation = RF

[BAF]

use the following options to calculate B allele frequency profiles and genotype status. This option can only be used if "inputFormat=pileup"

SNPfile = /bioinfo/users/vboeva/Desktop/annotations/hg19_snp131.SingleDiNucl.1based.txt

minimalCoveragePerPosition = 5

use "minimalQualityPerPosition" and "shiftInQuality" to consider only high quality position in calculation of allelic frequencies (this option significantly slows down reading of .pileup)

minimalQualityPerPosition = 5

shiftInQuality = 33

[target]

use a tab-delimited .BED file to specify capture regions (control dataset is needed to use this option):

captureRegions = /bioinfo/users/vboeva/Desktop/testChr19/capture.bed

##########################################################################################

And it failed with this output:

Control-FREEC v11.6 : a method for automatic detection of copy number alterations, subclones and for accurate estimation of contamination and main ploidy using deep-sequencing data Multi-threading mode using 10 threads ..consider the sample being male ..Breakpoint threshold for segmentation of copy number profiles is 0.6 ..telocenromeric set to 50000 ..FREEC is going to adjust profiles for a possible contamination by normal cells ..set contaminationAdjustment=FALSE if you don't want to use this option because you think that there is no contamiantion of your tumor sample by normal cells (e.g., it is a cell line, or it non-cancer DNA used without a control sample) ..FREEC is going to evaluate contamination by normal cells ..Coefficient Of Variation set equal to 0.06 ..it will be used to evaluate window size ..Output directory: . ..Directory with files containing chromosome sequences: CONTROL_FREEC/ ..Sample file: chr20.bam ..Sample input format: BAM ..will use this instance of samtools: 'samtools' to read BAM files ..minimal expected GC-content (general parameter "minExpectedGC") was set to 0.35 ..maximal expected GC-content (general parameter "maxExpectedGC") was set to 0.55 ..Polynomial degree for "ReadCount ~ GC-content" normalization is 3 or 4: will try both ..Minimal CNA length (in windows) is 1 ..File with chromosome lengths: res_chr20.fa ..Using the default minimal mappability value of 0.85 ..uniqueMatch = FALSE ..average ploidy set to 2 ..break-point type set to 2 ..noisyData set to 0 ..Control-FREEC will not look for subclones ..File res_chr20.fa was read total genome size: 6.44442e+07 ..samtools should be installed to be able to read BAM files read number: 85295976 coefficientOfVariation: 0.06 evaluated window size: 210 ..[genomecopynumber] Starting reading chr20.bam ..samtools should be installed to be able to read BAM files; will use the following command for samtools: samtools view -@ 10 chr20.bam ..finished reading chr20.bam PROFILING [tid=140489947232064]: chr20.bam read in 466 seconds [fillMyHash] 85295976 lines read.. 5900 reads used to compute copy number profile printing counts into ./chr20.bam_sample.cpn ..Window size: 210 ..using GC-content to normalize copy number profiles CG-content printed into ./GC_profile.210bp.cnp ..Running FREEC with ploidy set to 2 Error: zero reads in windows with the GC-content around 0.45 with interval 0.01, will try again with 0.04 ERROR: there was a problem in the initial guess of the polynomial. Please contact the support team of change your input parameters. Exit.

i don't know what's wrong .

Thank you --

valeu commented 2 years ago

chrLenFile = res_chr20.fa is wrong this should be a file with chromosome lengths. It is not recommended to run FREEC on just one chromosome, but if you insist, you should provide a chromosome length file with just the length of chr2, e.g. chr2 1239213129

lmanchon commented 2 years ago

Thank you. I didn't see this stupid mistake. I work with only one chromosome to make some tests to detect level of mosaicism by varying the sequencing depth at the CNVs position. I compare different tools to test their sensitivity in detecting mosaicism.

lmanchon commented 2 years ago

again same error with a good chromosome's length file:

File with chromosome lengths: length.txt ..Using the default minimal mappability value of 0.85 ..uniqueMatch = FALSE ..average ploidy set to 2 ..break-point type set to 2 ..noisyData set to 0 ..Control-FREEC will not look for subclones ..File length.txt was read total genome size: 6.44442e+07 ..samtools should be installed to be able to read BAM files read number: 85295976 coefficientOfVariation: 0.06 evaluated window size: 210 ..[genomecopynumber] Starting reading chr20.bam ..samtools should be installed to be able to read BAM files; will use the following command for samtools: samtools view -@ 24 chr20.bam ..finished reading chr20.bam PROFILING [tid=139856019896128]: chr20.bam read in 408 seconds [fillMyHash] 85295976 lines read.. 5900 reads used to compute copy number profile printing counts into ./chr20.bam_sample.cpn ..Window size: 210 ..using GC-content to normalize copy number profiles CG-content printed into ./GC_profile.210bp.cnp ..Running FREEC with ploidy set to 2 Error: zero reads in windows with the GC-content around 0.45 with interval 0.01, will try again with 0.04 ERROR: there was a problem in the initial guess of the polynomial. Please contact the support team of change your input parameters. Exit.