MariaNattestad / Assemblytics

Assemblytics is a bioinformatics tool to detect and analyze structural variants from a genome assembly by comparing it to a reference genome.
http://assemblytics.com
MIT License
135 stars 28 forks source link

Assemblytics within-alignments SVs #18

Closed GaoLei-bio closed 5 years ago

GaoLei-bio commented 5 years ago

Hi Maria,

Thanks for your great SV detection tool, Assemblytics! It's very helpful for our project. When I got the results of a genome comparison, I found some within-alignment SVs might be incorrect if they detected in reverse alignments as shown in attached slides. The Assemblytics I used was downloaded from your github. I also tested that integrated in RaGOO. The both gave same results.

I really appreciate it if you could take a look at this issue.

Thanks for your time, Lei Gao

Assemblytics question.pptx

MariaNattestad commented 5 years ago

Hi Lei

Thanks for your question. I hope this doesn't point to a larger issue, but we did test Assemblytics thoroughly for the paper, so if this is a large systematic issue, it would have to be something that we didn't see back then, like maybe a different version of Nucmer was used. Can you attach the relevant piece of the delta file this came from? And do you have the version number of MUMmer (nucmer) that you used for this?

Thanks!

From the delta file, the chunk would look something like this:

>ref_chromosome_x query_contig_y 4641652 4636831
1 1097583 3695453 2597877 7 7 0
9904
6
62923
139980
42360
811666
0
1096961 1873039 2598318 1822242 5 5 0
152832
344463
0
1873031 1972855 1821473 1721649 0 0 0
0
1978502 3700157 1721649 1 25 25 0
138239
485276
160630
7001
102456
118819
186586
-67627
117724
-61376
4
-215642
57484
1068
249
207
230
-32
-59
329
-79
104
-7
-75
-219
0
3700158 4296271 4636831 4040722 5 5 0
8223
168
59214
106063
0
4295948 4641652 4041158 3695454 12 12 0
-434
-1
309713
3242
0
GaoLei-bio commented 5 years ago

Hi Maria,

Thank you very much for your reply.

Attached please find the piece of delta file for that SV.

Please note my delta is not generated by MUMmer. The alignment was performed by minimap2, and the resulted sam file was then converted to delta file by sam2delta.py from RaGOO (https://github.com/malonge/RaGOO). Lei.delta.gz

Best, Lei

MariaNattestad commented 5 years ago

Hi Lei

Looking closely at the delta file you attached, I can see the deletion at the position Assemblytics reported. Reading delta files is really not intuitive, but if you go to line 13194 (found by searching for 4524752), your deletion starts at line 13353: 296 lines of 1s translates to a deletion of 297 bp.

If you want to check this yourself, the rules for calculating the exact basepair distance between the beginning of the alignment and where that deletion starts are a bit complicated (see all changes to current_reference_position in Assemblytics_within_alignment.py), but you can get an approximate distance by calculating the cumulative sum of the absolute values. This gives 14979, which is a lot closer to 14832 than to the other end of the alignment. If you want it more exact, subtract one for each line, and add 1 every time the number is positive.

This doesn't look like a problem with Assemblytics, since the deletion is in the delta file, so if I were you I'd look upstream and see if you can trace the deletion into the sam file too.

I hope that helps you track it down. Good luck! Maria

GaoLei-bio commented 5 years ago

Hi Maria,

Thank you very much.

Attached please find the screenshots of the expected deletion region. Assemblytics question2.pptx

I think the deletion is 14979 or 14832 to the other end, because it is on a backward alignment. The reference genome and query sequence have opposite directions.

Thank you very much for your help and sorry for bothering you again.

Lei

MariaNattestad commented 5 years ago

Right, but the order of the delta file should represent the reference order, consistent with all the examples I’ve seen from MUMmer. Have you checked this delta file with any other tools or visualized it? You can use the built-in mummer tools like show-coords or show-aligns to see a visual representation of what the alignment looks like in the delta file.

On Wed, Jul 24, 2019 at 9:27 AM Lei Gao notifications@github.com wrote:

Hi Maria,

Thank you very much.

Attached please find the screenshots of the expected deletion region. Assemblytics question2.pptx https://github.com/MariaNattestad/Assemblytics/files/3427820/Assemblytics.question2.pptx

I think the deletion is 14979 or 14832 to the other end, because it is on a backward alignment. The reference genome and query sequence have opposite directions.

Thank you very much for your help and sorry for bothering you again.

Lei

— You are receiving this because you commented.

Reply to this email directly, view it on GitHub https://github.com/MariaNattestad/Assemblytics/issues/18?email_source=notifications&email_token=AB4W4PNCFTFUZVIUO6OAKODQBB7G3A5CNFSM4IEDAKF2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD2W4FLY#issuecomment-514704047, or mute the thread https://github.com/notifications/unsubscribe-auth/AB4W4PPPSLT2NN2QEF7E3D3QBB7G3ANCNFSM4IEDAKFQ .

GaoLei-bio commented 5 years ago

Hi Maria,

Many thanks for the prompt reply. Your suggestions are greatly appreciated.

Best wishes, Lei