EichlerLab / smrtsv2

Structural variant caller
MIT License
53 stars 6 forks source link

smrtsv genotype for an arbitrary VCF #39

Closed AyushSaxena closed 5 years ago

AyushSaxena commented 5 years ago

Hello,

Can the genotype function be used for an arbitrary VCF file generated with randomly generated SV's or with SV's generated from a different caller? I'm asking because the manual specifies that the positional argument 'genotyped_variants' should have "VCF of SMRT SV variant genotypes for the given sample-level BAMs."

Ayush

paudano commented 5 years ago

The genotyper requires a set of assembled contigs with known SV breakpoints. It uses reference breakpoints and breakpoints on contigs to compare read mapping between the reference and the alternate allele to make a genotype call. SMRT-SV generates contigs, calls SVs from them, and tracks SV breakpoints.

In the VCF INFO field, there are CONTIG, CONTIG_START, and CONTIG_END fields, which allows the genotyper to find the location of the SV in an assembly. For example, a 300 bp insertion will have CONTIG_START at the beginning of the SV, and CONTIG_END 300 bp downstream of CONTIG_START (using BED coordinates).

So an arbitrary VCF will not work because few tools call SVs from contigs this way (e.g. NGS callers like Lumpy, Delly, WhamG, and GenomeSTRiP). SMRT-SV and Phased-SV both do, and any assembly mapped to the reference can be used to find SVs this way. Any of those tools can give the contigs and SV calls annotated with contig coordinates (some may require additional work to format the VCF), and the SMRT-SV genotyper could use it. Both the VCF and a BAM of the contig alignments are required for the SMRT-SV genotyper.