MariaNattestad / Assemblytics

Assemblytics is a bioinformatics tool to detect and analyze structural variants from a genome assembly by comparing it to a reference genome.
http://assemblytics.com
MIT License
136 stars 28 forks source link

Issue with coordinates of variants #44

Closed MartuGio closed 7 months ago

MartuGio commented 2 years ago

Hello, thanks for the great tool and for your support.

I have aligned 2 (not scaffolded) assemblies of Phaseolus vulgaris (about 500 Mb) with minimap2 using this command: minimap2 --MD -L -2 -ax asm5 -f 0.005 -t 10 -I 4G -K 500M -o

The alignment file (in sam format) was converted in delta format using a script available at: https://github.com/malonge/RaGOO/blob/master/sam2delta.py

I have used both the online version of Assemblytics and the 1.2.1 version available at https://github.com/MariaNattestad/Assemblytics, using 10,000, 100,000 and 50 as unique sequence length, maximum and minimum variant size respectively.

By visualizing the alignment (resulting from minimap2) with IGV visualizer, I have noticed that sometimes, especially in variants called within the alignment, the coordinates of the variants called by Assemblytics do not correspond. Here below an example of incongruences between coordinates:

example

Do you have an explanation for these incongruences?

Thanks for your help,

Giovanni

MariaNattestad commented 2 years ago

Hi Giovanni

Have you tried other scripts for converting sam to delta? Assemblytics calls variants largely by simply parsing the delta file, which I verified extensively with MUMmer show-aligns to see that they were producing the same results, so I'm surprised you saw this problem. Do you have other ways of verifying that the sam2delta.py script is accurate? I'm guessing the IGV screenshot above is showing the sam file, but maybe there are ways of visualizing the delta file itself to ensure that it matches what is in the sam file?

Maria