Genotype SV in multiple samples sequenced with short-reads, using a catalog of SV
This uses vg: https://github.com/vgteam/vg
input data are:
For this study on whitefish we used:
Script 4a reformat the vcf from long-reads as wanted Script 4b join using Jasmine the 3 catalog of SV (https://github.com/mkirsche/Jasmine ) Script 4c filter and format the vcf of SV
Script 5 uses vg autoindex --giraffe
Scripts 6 loop over samples to align short-reads on the graph and pack the alignments. There are two scripts as I splitted the 32 individuals into 2 loops of 16 to make it faster. This needs to be parallelize much better (by individuals) to save time.
Script 7a loop over individuals to call variants based on the alignment in the graph. this could be parallelized by individuals Script 7b filters individuals vcfs - filters can be adjuste to data. Here we are quite tolerant as data is low-medium coverage and we plan on working with genotype likelihoods.It also merge all vcfs into a single file with GL for all samples.