Closed brucemoran closed 4 years ago
This column is not extant
It was renamed to sourceId as StructuralVariantAnnotation supports loading from other sources as well (e.g. BEDPE).
there is no mention of extracting per-sample ranges in the documentation.
A VCF doesn't have per-sample events. It has an all-sample set of events, and a breakdown of the per-sample support for those events is defined in the FORMAT fields (i.e. geno(vcf)
).
Can you please specify how I can get individual GRanges objects per-sample from a multisample VCF, with multiple tumour?
You get the GRanges for all calls, then subset it to only variants supported by that sample. How you define support depends on your use case and model. If you've got independent samples then you'd probably want use high confidence threshold used by grids_somatic_filter.R:
library(StructuralVariantAnnotation)
vcf = readVcf("colo829.vcf")
bpgr = breakpointRanges(vcf)
sample_name="COLO829T.bam"
sample_specific_bpgr = bpgr[geno(vcf[bpgr$sourceId])$QUAL[,sample_name] >= 350]
If your tumour samples are related (e.g. multiple mets), then you'll want a much lower threshold so you can look at subclones commons across samples. If you have 5% subclone in one met is found at 100% in another, you'd want to keep those 5% SVs even if they don't pass the high confidence threshold. In these cases, you might want a lower threshold of, say, 2 supporting fragments bpgr[geno(vcf[bpgr$sourceId])$VF[,"COLO829T.bam"] >= 2]
.
Another option is to use gridss_somatic_filter.R
included in GRIDSS to convert the raw GRIDSS output into a somatic VCF. This can be done per-sample using the --tumourordinal
command line parameter for that script.
Thank you very much Daniel for your response, I had indeed considered your second solution but glad to see how to separate out samples from the GRanges object.
And thanks also for your ideas on filtering when working on subclones, this is relevant for our work.
All the best, Bruce
In that case, some of the functions in the libgridss.R
library that does the heavy lifting for gridss_somatic_filter.R
may also be relevant to you. For example gridss_bp_af
and gridss_be_af
to calculate tumour variant allele fraction. For small events, fragments from the variant haplotype can entirely span the allele so they need a slightly different VAF calculation than large events.
Hi,
I have an issue with multisample VCF input to the package.
Best practice in GRIDSS specifies joint calling of all samples creating this input.
I have read a few other issues, #10 and #11 which reference a
vcfid
column inGRanges
frombreakpointRanges(vcf)
, these are older so understandable if things have changed.This column is not extant, and there is no mention of extracting per-sample ranges in the documentation.
Can you please specify how I can get individual
GRanges
objects per-sample from a multisample VCF, with multiple tumour?Thanks,
Bruce