BoevaLab / FREEC

Control-FREEC: Copy number and genotype annotation in whole genome and whole exome sequencing data
153 stars 49 forks source link

Understanding _ratio.txt and _BAF.txt for counting informative markers #146

Open kmavrommatis opened 4 months ago

kmavrommatis commented 4 months ago

Hi, I am trying to understand the contents of the files Freec produces and count the number of SNPs used as markers for each segment. My intention is to pass the output of FREEC to GISTIC for a cohort analysis.

Based on my understanding, the file _BAF.txt contains the information for each of the SNPs found in the public SNP database, e.g. dbsnp. The file _ratio.txt contains the information for each exon region (target interval). I noticed the script FREEC_ratio2Absolute.R seems to be using the _ratio.txt file to produce an ouput similar to Absolute, but the column Num_probes corresponds to the number of exon regions, not the number of SNPs,

If we want to count the number of SNPs (markers) that were used for each segment call we should count the number of informative SNPs in the _BAF.txt file, i.e the SNPs that have uncertainty > -1, is this correct?

Thanks in advance for your help

valeu commented 4 months ago

Hi, SNPs are only used for BAF estimates, not to call CNAs.. there, each exon (or window for WGS) is used as one point.. Not a SNP. This is why I output the number of exons to Absolute. And to me, all your points seem to be correct.