PapenfussLab / gridss

GRIDSS: the Genomic Rearrangement IDentification Software Suite
Other
255 stars 71 forks source link

Tumor Varcount, Tumor/normal depth counts #426

Closed alhafidzhamdan closed 4 years ago

alhafidzhamdan commented 4 years ago

Hi there,

I am trying to run palimpsest https://github.com/FunGeST/Palimpsest. Their SV requirement is

Sample: Sample identifier. Any alphanumeric string.
Type: Type of structural variant: INV/DEL/DUP/BND.
CHROM_1: Chromosome of the first breakpoint. Between chr1 and chr22 or the chrX or chrY ('chr' prefix required).
POS_1: Position of the first breakpoint. A positive integer.
CHROM_2: Chromosome of the second breakpoint. Between chr1 and chr22 or the chrX or chrY ('chr' prefix required).
POS_2: Position of the second breakpoint. A positive integer.
Tumor_Varcount: Column for variant allele count information.
Tumor_Depth: Column for tumour sequencing depth information.
Normal_Depth: Column for normal sequencing depth information.
Driver: OPTIONAL column indicating the driver events to be annotated in tumour history plots.

I can use your handy simpleEventType() function to convert BND types. However, and I know this has been asked before in various forms in this forum, but it's better to ask directly from you rather than inferring it myself, how do I generate Tumor_Varcount, Tumor_Depth and Normal_Depth?

Many thanks Al

d-cameron commented 4 years ago

how do I generate Tumor_Varcount

This is the VF field for breakpoints, and the BVF field for single breakends.

GRIDSS reports total support in the INFO column, as well as a per-sample breakdown.

Tumor_Depth and Normal_Depth

These fields don't make sense to me as it is unclear what 'depth' is, nor why they only want 2 depths and not 4.

GRIDSS reports REF and REFPAIR which are the number of reference-supporting reads spanning the breakpoint, and the number of reference-supporting read pairs spanning the breakpoint. Again, GRIDSS reports totals in the INFO column, as well as a per-sample breakdown so for these you'll want a per-sample calculation for normal reads and the same calculation on the tumour reads.

The big issue I have is that those depths are not per-breakpoint. Depth is calculated as variant reads + reference reads. The problem is reference reads can be completely different on each side of the breakpoints. A breakpoints can be homozyous on one side and heterozygous on the other. I don't know what their depth fields actually mean. Maybe average it with (REF1+REF2+REFPAIR1+REFPAIR2)/2+VF? I'm dubious of how good the result of running the tool will be if the inputs are in a format that doesn't make sense.

Note that for small variants, REFPAIR read pairs may actually be from the variant haplotype. A 50bp DEL with a 500bp fragment will have reads containing the DEL that appear to still support the reference (e.g. a fragment that's actually 500bp long that look 550bp since it spans the DEL). If you don't account for this, you'll underestimate the tumour allele fraction for small events. libgridss.R has some helper functions that accounts for this in the VAF reporting functions.

If you're wanting to do somatic SV interpretation and analysis, I strongly recommend trying our GRIDSS/PURPLE/LINX pipline (https://www.biorxiv.org/content/10.1101/781013v1) as the LINX output gives a much much more comprehensive picture of the somatic rearrangement landscape than any other tools currently available.

d-cameron commented 4 years ago

Alternatively, depth might be the read depth between the breakpoints for dup/del/inv events.

d-cameron commented 4 years ago

INV is also problematic as there are way more fold-back inversions than actual balanced (two-breakpoint) inversion in cancer genomes.

d-cameron commented 4 years ago

See https://github.com/hartwigmedical/hmftools/tree/master/sv-linx#key-concepts-in-linx for the model we use.

alhafidzhamdan commented 4 years ago

Thanks for this! Appreciate the caveat explanations. I share your concerns re: ambiguity in "depth" info. I have run GRIDSS-PURPLE-LINX but I don't think LINX provides clonality information of the breakends? Unless I'm missing something.

d-cameron commented 4 years ago

I don't think LINX provides clonality information of the breakends?

It does purity-adjusted VAF and infers total SV copy number but no, it doesn't explicitly estimate clonality. Anything with low inferred SV copy number is clearly subclonal but it won't tell you a CN=1 SV is clonal or a CN=2 at 50% clonality.