ChristofferFlensburg / superFreq

Analysis pipeline for cancer sequencing data
MIT License
110 stars 33 forks source link

Understanding columns in the CNA segment output file #88

Closed jsha129 closed 2 months ago

jsha129 commented 2 years ago

Dear Christoffer and team, Thank you for developing superfreq. It does pretty much everything we want from WES analysis. All those annotations are great :) I have been looking at CNAsegments*.tsv. could you please explain what each column refers to? Which filters should one use to identify significant CNVs? I also don't know what var, flag, pbq, pmq and psr refer to in somaticVariants.csv - is this VEP output? Many thanks

columns of CNVsegments

chr: chromosome start: start end: end x1: x2: M: width: df: degree of freedom var: cov: Nsnps: pHet: pAlt: odsHet: f: stat: nullStat: altStat: nullStatErr: altStatErr: postHet: ferr: call: clonality: clonalityError: sigma: pCall: subclonality: subclonalityError: genes: COSMIC_genes:

ChristofferFlensburg commented 2 years ago

Hi!

Yep, the CNVsegment file is a recurring question so I finally made my first entry in the wiki! https://github.com/ChristofferFlensburg/superFreq/wiki

For variants, "var" is number of variant supporting reads (ref is reference supporting and cov is total read depth). "flag" is flagging quality issues (empty string is fine), while the last three are p-values for base quality, mapping quality and strand ratio, where null hypothesis is that variant and reference reads have the same base qualities, mapping qualities and strand ratios. SuperFreq isn't running VEP (any longer, it was a few years ago).