MariaNattestad / Assemblytics

Assemblytics is a bioinformatics tool to detect and analyze structural variants from a genome assembly by comparing it to a reference genome.
http://assemblytics.com
MIT License
135 stars 28 forks source link

Why some "insertion"s reported by assemblytics have length of zero? #52

Closed mariaelf97 closed 1 year ago

mariaelf97 commented 1 year ago

Hello,

I have been running assemblytics on some assemblies with a reference genome and I noticed some isolates with multiple SVs in one region, have insertion sequence that has length of zero. I wonder if you ran into the same issue?

These are the steps I used :

nucmer -maxmatch -l 100 -c 500 final.fasta ref.fasta -prefix isolate_name
Assemblytics  isolate_name.delta OUT 10000 50 10000

insertion reported in 1 3932775 3932775 Assemblytics_w_4 75 + Insertion 0 75 1|quiver|quiver|quiver:3936094-3936169:+ within_alignment the length of insertion is zero

MariaNattestad commented 1 year ago

Is it length zero only in the reference coordinates (that’s expected) or also in the query coordinates?

On Mon, Dec 5, 2022 at 12:33 PM Maryam Ahmadi J @.***> wrote:

Hello,

I have been running assemblytics on some assemblies with a reference genome and I noticed some isolates with multiple SVs in one region, have insertion sequence that has length of zero. I wonder if you ran into the same issue?

— Reply to this email directly, view it on GitHub https://github.com/MariaNattestad/Assemblytics/issues/52, or unsubscribe https://github.com/notifications/unsubscribe-auth/AB4W4PN7JIO7YJMEDB4HUZTWLZGR5ANCNFSM6AAAAAASUWLCPA . You are receiving this because you are subscribed to this thread.Message ID: @.***>

mariaelf97 commented 1 year ago

Is it length zero only in the reference coordinates (that’s expected) or also in the query coordinates? On Mon, Dec 5, 2022 at 12:33 PM Maryam Ahmadi J @.> wrote: Hello, I have been running assemblytics on some assemblies with a reference genome and I noticed some isolates with multiple SVs in one region, have insertion sequence that has length of zero. I wonder if you ran into the same issue? — Reply to this email directly, view it on GitHub <#52>, or unsubscribe https://github.com/notifications/unsubscribe-auth/AB4W4PN7JIO7YJMEDB4HUZTWLZGR5ANCNFSM6AAAAAASUWLCPA . You are receiving this because you are subscribed to this thread.Message ID: @.>

I thought the .bed output refers to whatever coordinates that is passed to nucmer first. So in our case it's final.fasta which is the query genome.

MariaNattestad commented 1 year ago

(sorry I was reading your question from the email where it didn't include your edit that showed the .bed entry, but now I can see it) The first fasta passed to nucmer is the reference, as shown in the code snippet on assemblytics.com: nucmer -maxmatch -l 100 -c 500 REFERENCE.fa ASSEMBLY.fa -prefix OUT. "assembly" here is the "query". The .bed coordinates are always referring to the reference, so all insertions are length 0 in the reference coordinates, and that is on purpose.

On Mon, Dec 5, 2022 at 4:13 PM Maryam Ahmadi J @.***> wrote:

Is it length zero only in the reference coordinates (that’s expected) or also in the query coordinates? … <#m_-1433580249179658236_m815482317831939022> On Mon, Dec 5, 2022 at 12:33 PM Maryam Ahmadi J @.> wrote: Hello, I have been running assemblytics on some assemblies with a reference genome and I noticed some isolates with multiple SVs in one region, have insertion sequence that has length of zero. I wonder if you ran into the same issue? — Reply to this email directly, view it on GitHub <#52 https://github.com/MariaNattestad/Assemblytics/issues/52>, or unsubscribe https://github.com/notifications/unsubscribe-auth/AB4W4PN7JIO7YJMEDB4HUZTWLZGR5ANCNFSM6AAAAAASUWLCPA https://github.com/notifications/unsubscribe-auth/AB4W4PN7JIO7YJMEDB4HUZTWLZGR5ANCNFSM6AAAAAASUWLCPA . You are receiving this because you are subscribed to this thread.Message ID: @.>

I thought the .bed output refers to whatever coordinates that is passed to nucmer first. So in our case it's final.fasta which is the query genome.

— Reply to this email directly, view it on GitHub https://github.com/MariaNattestad/Assemblytics/issues/52#issuecomment-1338438933, or unsubscribe https://github.com/notifications/unsubscribe-auth/AB4W4PLYNUA6TWOGJD3JXSTWL2AL3ANCNFSM6AAAAAASUWLCPA . You are receiving this because you commented.Message ID: @.***>

mariaelf97 commented 1 year ago

(sorry I was reading your question from the email where it didn't include your edit that showed the .bed entry, but now I can see it) The first fasta passed to nucmer is the reference, as shown in the code snippet on assemblytics.com: nucmer -maxmatch -l 100 -c 500 REFERENCE.fa ASSEMBLY.fa -prefix OUT. "assembly" here is the "query". The .bed coordinates are always referring to the reference, so all insertions are length 0 in the reference coordinates, and that is on purpose. On Mon, Dec 5, 2022 at 4:13 PM Maryam Ahmadi J @.> wrote: Is it length zero only in the reference coordinates (that’s expected) or also in the query coordinates? … <#m_-1433580249179658236_m815482317831939022> On Mon, Dec 5, 2022 at 12:33 PM Maryam Ahmadi J @.> wrote: Hello, I have been running assemblytics on some assemblies with a reference genome and I noticed some isolates with multiple SVs in one region, have insertion sequence that has length of zero. I wonder if you ran into the same issue? — Reply to this email directly, view it on GitHub <#52 <#52>>, or unsubscribe https://github.com/notifications/unsubscribe-auth/AB4W4PN7JIO7YJMEDB4HUZTWLZGR5ANCNFSM6AAAAAASUWLCPA https://github.com/notifications/unsubscribe-auth/AB4W4PN7JIO7YJMEDB4HUZTWLZGR5ANCNFSM6AAAAAASUWLCPA . You are receiving this because you are subscribed to this thread.Message ID: @.> I thought the .bed output refers to whatever coordinates that is passed to nucmer first. So in our case it's final.fasta which is the query genome. — Reply to this email directly, view it on GitHub <#52 (comment)>, or unsubscribe https://github.com/notifications/unsubscribe-auth/AB4W4PLYNUA6TWOGJD3JXSTWL2AL3ANCNFSM6AAAAAASUWLCPA . You are receiving this because you commented.Message ID: @.>

That makes sense. Thank you!