clairemerot / genotyping_SV

Genotype SV in multiple samples sequenced with short-reads, using a catalog of SV
2 stars 1 forks source link

genotyping_SV

Genotype SV in multiple samples sequenced with short-reads, using a catalog of SV

2/3- Gather input files and install vg

This uses vg: https://github.com/vgteam/vg

input data are:

4- Format the catalog of SVs

For this study on whitefish we used:

Script 4a reformat the vcf from long-reads as wanted Script 4b join using Jasmine the 3 catalog of SV (https://github.com/mkirsche/Jasmine ) Script 4c filter and format the vcf of SV

5- Build genome graph with reference and SV variantion, then index it

Script 5 uses vg autoindex --giraffe

6- Align short-reads on the genome graph

Scripts 6 loop over samples to align short-reads on the graph and pack the alignments. There are two scripts as I splitted the 32 individuals into 2 loops of 16 to make it faster. This needs to be parallelize much better (by individuals) to save time.

7- Call variants, filter and merge

Script 7a loop over individuals to call variants based on the alignment in the graph. this could be parallelized by individuals Script 7b filters individuals vcfs - filters can be adjuste to data. Here we are quite tolerant as data is low-medium coverage and we plan on working with genotype likelihoods.It also merge all vcfs into a single file with GL for all samples.