How to merge multi-samples SVs and obtain breakpoints for genotyping a population

Hi @pkrusche @traxexx @egor-dolzhenko

Thanks for your contribution. I obtained multi-sample SV sets from pan-genome. Graph-population genotyping is not a good choice for my large plant genome, because the calculation is very large. Paragraph is an accurate genotyper for population short-read sequencing data to further genotype the SVs that had been mined. As we all know, inaccurate breakpoints will result in a bad genotyping performance, although genotyping with breakpoint deviations (1-18bp) also offers a nice performance (~0.9). But, I still have some questions as below.

An example:

Chr1 | 101508 | 101750 | sample1 Chr1 | 101510 | 101770 | sample2 Chr1 | 101510 | 101771 | sample3 Chr1 | 101512 | 101773 | sample4 Chr1 | 101512 | 101776 | sample5, sample14 Chr1 | 101510 | 101777 | sample6 Chr1 | 101514 | 101779 | sample7, sample16 Chr1 | 101515 | 101780 | sample8 Chr1 | 101515 | 101781 | sample9, sample15,sample17 Chr1 | 101515 | 101784 | sample10 Chr1 | 101515 | 101785 | sample11 Chr1 | 101515 | 101786 | sample12 Chr1 | 101518 | 101789 | sample13

Left breakpoint is in 101508-101518(11bp), and right is in 101750-101789(40bp). How to obtain certain breakpoints for genotyping a population to achieve relatively high performance?

Could you give me some advice? Thanks!

Sincerely, Zhiliang

Illumina / paragraph

How to merge multi-samples SVs and obtain breakpoints for genotyping a population #71