PapenfussLab / gridss

GRIDSS: the Genomic Rearrangement IDentification Software Suite
Other
252 stars 73 forks source link

Document integration with StructuralVariantAnnotation #34

Closed jasper1918 closed 5 years ago

jasper1918 commented 8 years ago

Hi Daniel, Just getting started using gridss after talking with you at ISMB. It wasn't obvious from the readme how to run the VcfBreakendToBedpe script. I did eventually find how you referenced the class path to call it but thought it would be helpful to show this on the readme for other newcomers. -Jeff Jasper

d-cameron commented 8 years ago

I am in the process of deprecating VcfBreakendToBedpe in favour of a more powerful solution. The problem with VcfBreakendToBedpe is twofold: firstly, most pipelines are only interested in a subset of the data and will be applying some sort of the filtering criteria, and secondly, the conversion to bedpe removes some of the information needed for filtering (eg somatic/normal).

The approach I currently recommend is to use my StructuralVariantAnnotation package that parses GRIDSS VCF files (as well as VCFs from about 8 other variant callers), and converts the mess that is VCF into a nice R data frame which can be trivially filtered or output to VCF.

The package is still in development and I haven't got to document the API yet but if you have a look at Have a look at https://github.com/PapenfussLab/gridss/blob/master/example/somatic.R. It does what VcfBreakendToBedpe does in just a few lines but makes filtering the data to the variants of interest much much easier.

d-cameron commented 8 years ago

Renaming issue to reflect outstanding documentation work required

jasper1918 commented 8 years ago

Thanks for pointing out the R package. I've been using bedpe as a means to evaluate sensitivity/specificity of different SV callers and it sounds like your package will be a great alternative.

d-cameron commented 8 years ago

If you're doing SV caller comparisons, you may be interested in having a look at code I used to generate the benchmarking figures for both by GRIDSS presentation, and my ISMB benchmarking poster. In particular, the findMatchingBreakpoints function of https://github.com/d-cameron/sv_benchmark/blob/master/R/sv_benchmark.R has some useful matching logic for intra-chromosomal events, and the findBreakpointOverlaps function of StructuralVariantAnnotation was designed to be the breakpoint equivalent to the bioconductor GenomicRanges findOverlaps function.

d-cameron commented 5 years ago

StructuralVariantAnnotation is now in BioConductor with vignette documentation and an example SV comparison plot.