BoevaLab / FREEC

Control-FREEC: Copy number and genotype annotation in whole genome and whole exome sequencing data
153 stars 49 forks source link

Help with ratio.txt file #100

Closed merckey closed 2 years ago

merckey commented 2 years ago

Hi, I'm learning FREEC and trying to interpret the ratio.txt file generated for the whole exome match tumor and normal samples. Below is an example output from an exon region in this file. I have the following questions and really hope you could help to explain.

Chromosome Start Ratio MedianRatio CopyNumber BAF estimatedBAF Genotype UncertaintyOfGT Gene
1 930155 0.642677 0.587773 1 -1 2 A -1 1:930154-930336
1 955923 1.08682 0.529579 1 -1 1 A -1 1:955922-956013

The IGV bam and cpn tracks are included.

  1. How to calculate the Ratio? My understanding is that Ratio is the tumor_read_count/normal_read_count. Does this take into account the sequencing depth differences between the 2 samples (T: 140M reads, N 70M reads)? If you look at the pileups for them, the raw read count ratio between T/N is around 4. When normalized by total read depth individually, the ratio is also around 2. But why the reported results is 0.64? I think I missed some important information, but couldn't figure it out.
  2. How to calculate the CopyNumber? If the ploidy is 2, should I multiply it with the Ratio or the MedianRatio?
  3. In the tumor and normal cpn files, the "raw copy number profile" is 360 and 111 respectively. Are they the raw read count for a specific position?

Really look forward for your reply. Thank you! Jin

image

Here is the config file

[general]
BedGraphOutput = FALSE
chrFiles = /project/reference/BI_hg38/chromosomes/
chrLenFile = /project/reference/BI_hg38/Homo_sapiens_assembly38.chr24.fasta.fai
forceGCcontentNormalization = 1
maxThreads = 10
ploidy = 2
sex = XY
outputDir = freec/T_N
min_subclone = 30
readCountThreshold = 50
breakPointThreshold = 1.2
breakPointType = 4
window = 0
gemMappabilityFile = /project/reference/BI_hg38/out100m2_hg38.gem
noisyData = True

[control]
inputFormat = pileup
mateFile = freec/pileup/N.pileup
mateOrientation = FR

[sample]
inputFormat = pileup
mateFile = freec/pileup/T.pileup
mateOrientation = FR

[BAF]
SNPfile = /project/cwa236_uksr/reference/BI_hg38/dbsnp_138.hg38.vcf.gz
minimalCoveragePerPosition = 5
shiftInQuality = 33

[target]
captureRegions = /project/reference/exome_kit/targets_sorted_merged.bed
valeu commented 2 years ago

Hi,

  1. you can check these two publications explaining the FREEC algorithm. In short, only with degree=0 the ratios are ratios between Test and Control (normalized for the total read count) Control-FREEC: a tool for assessing copy number and allelic content using next generation sequencing data. V. Boeva, T. Popova, K. Bleakley, P. Chiche, I. Janoueix-Lerosey, O. Delattre and E. Barillot. Bioinformatics, 2012, 28(3):423-5. PMID: 22155870. CNA detection part of Control-FREEC (simply FREEC)

Control-free calling of copy number alterations in deep-sequencing data using GC-content normalization. V. Boeva, A. Zinovyev, K. Bleakley, J.-P. Vert, I. Janoueix-Lerosey, O. Delattre and E. Barillot. Bioinformatics, 2011, 27(2):268-9. PMID: 21081509. LOH detection part of Control-FREEC

  1. round(Median ratio * Ploidy)=copy number
  2. for this particular exon in your case (+flanking region)
merckey commented 2 years ago

Thank you for the explanation. I will check the 2 papers.