Closed AyushSaxena closed 5 years ago
The genotyper requires a set of assembled contigs with known SV breakpoints. It uses reference breakpoints and breakpoints on contigs to compare read mapping between the reference and the alternate allele to make a genotype call. SMRT-SV generates contigs, calls SVs from them, and tracks SV breakpoints.
In the VCF INFO
field, there are CONTIG
, CONTIG_START
, and CONTIG_END
fields, which allows the genotyper to find the location of the SV in an assembly. For example, a 300 bp insertion will have CONTIG_START
at the beginning of the SV, and CONTIG_END
300 bp downstream of CONTIG_START
(using BED coordinates).
So an arbitrary VCF will not work because few tools call SVs from contigs this way (e.g. NGS callers like Lumpy, Delly, WhamG, and GenomeSTRiP). SMRT-SV and Phased-SV both do, and any assembly mapped to the reference can be used to find SVs this way. Any of those tools can give the contigs and SV calls annotated with contig coordinates (some may require additional work to format the VCF), and the SMRT-SV genotyper could use it. Both the VCF and a BAM of the contig alignments are required for the SMRT-SV genotyper.
Hello,
Can the genotype function be used for an arbitrary VCF file generated with randomly generated SV's or with SV's generated from a different caller? I'm asking because the manual specifies that the positional argument 'genotyped_variants' should have "VCF of SMRT SV variant genotypes for the given sample-level BAMs."
Ayush