Closed ChriKub closed 5 years ago
Thanks for sharing this. What commands did you use?
I thought that there was a check for path integrity that will blow up when the sequence is altered. There could be some other kind of corruption. If it happens at the point of input then this wouldn't be caught.
On Thu, Aug 15, 2019, 11:03 ChriKub notifications@github.com wrote:
I use seqwish to align 17 sequences of varying length (169bp - 30082bp), but after the graph inducement the (path) length of sequence >5 changes from 30082bp to 30805bp, thus changing the sequence content. minimap2 bubbles_sub.fasta bubbles_sub.fasta -c -X > bubbles_sub.paf seqwish -p bubbles_sub.paf -s bubbles_sub.fasta -b bubbles_sub.graph -g bubbles_sub.gfa -r 21
Fasta Download https://owncloud.tuebingen.mpg.de/index.php/s/RbWjZM3nQG5kaNc
— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/ekg/seqwish/issues/20?email_source=notifications&email_token=AABDQEPDHGTIBY25U2KFVVDQEULXLA5CNFSM4IL4NH62YY3PNVWWK3TUL52HS4DFUVEXG43VMWVGG33NNVSW45C7NFSM4HFMNEYQ, or mute the thread https://github.com/notifications/unsubscribe-auth/AABDQEO7MQJEQQOM7Q5RTPTQEULXLANCNFSM4IL4NH6Q .
I ran these commands, using both the latest version of minimap2 and seqwish. I then loaded the gfa into vg(v.1.14) and used vg paths to check the path length:
minimap2 bubbles_sub.fasta bubbles_sub.fasta -c -X > bubbles_sub.paf
seqwish -p bubbles_sub.paf -s bubbles_sub.fasta -b bubbles_sub.graph -g bubbles_sub.gfa -r 21
vg view -F -v bubbles_sub.gfa > bubbles_sub.vg
vg paths -E -v bubbles_sub.vg
Do you get the same problem without -r21?
On Thu, Aug 15, 2019, 11:23 ChriKub notifications@github.com wrote:
I ran these commands, using both the latest version of minimap2 and seqwish. I then loaded the gfa into vg(v.1.14) and used vg paths to check the path length:
minimap2 bubbles_sub.fasta bubbles_sub.fasta -c -X > bubbles_sub.paf seqwish -p bubbles_sub.paf -s bubbles_sub.fasta -b bubbles_sub.graph -g bubbles_sub.gfa -r 21 vg view -F -v bubbles_sub.gfa > bubbles_sub.vg vg paths -E -v bubbles_sub.vg
— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/ekg/seqwish/issues/20?email_source=notifications&email_token=AABDQEKSQQDAK7ITSFVSZI3QEUOCDA5CNFSM4IL4NH62YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD4LKGEI#issuecomment-521577233, or mute the thread https://github.com/notifications/unsubscribe-auth/AABDQEPLSHCBUFOKAEJV5TDQEUOCDANCNFSM4IL4NH6Q .
Yes. Same result for the path length.
I uploaded the fasta files to reproduce this in the first post.
Thanks, I've reproduced the problem. I'll see what's up.
On Fri, Aug 16, 2019 at 8:35 AM ChriKub notifications@github.com wrote:
I uploaded the fasta files to reproduce this in the first post.
— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/ekg/seqwish/issues/20?email_source=notifications&email_token=AABDQEPH34E7DZDKYUCD4N3QEZDBXA5CNFSM4IL4NH62YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD4NZUFI#issuecomment-521902613, or mute the thread https://github.com/notifications/unsubscribe-auth/AABDQENMRZV7NHYRCI4JUA3QEZDBXANCNFSM4IL4NH6Q .
It looks like there's been a problem related to this for some time (since 87b6a85 at least):
time minimap2 -c -x asm20 -X -t 4 DRB1-3123.fa DRB1-3123.fa | fpa drop -l 3000 | gzip >DRB1-3123.paf.gz && time ~/seqwish/bin/seqwish-87b6a85 -s DRB1-3123.fa -p DRB1-3123.paf.gz -b DRB1-3123.work -g DRB1-3123.gfa-87b6a85 && vg view -Fv DRB1-3123.gfa-87b6a85 | vg paths -v - -F >DRB1-3123.gfa-87b6a85.fasta && samtools faidx DRB1-3123.gfa-87b6a85.fasta
But when comparing with the current state, it's not exactly the same issue:
erik@foldy [05:36:26 PM] [~/seqwish/test] [path-bug]
-> % paste <(sort -V DRB1-3123.fa.fai) <(sort -V DRB1-3123.gfa.fasta.fai ) | awk '{ print $7 - $2 }'
0
0
0
1
-44
0
0
0
0
0
0
0
erik@foldy [05:36:35 PM] [~/seqwish/test] [path-bug]
-> % paste <(sort -V DRB1-3123.fa.fai) <(sort -V DRB1-3123.gfa-87b6a85.fasta.fai ) | awk '{ print $7 - $2 }'
0
0
0
46
2
0
0
0
0
0
0
0
This should have been tested before, sorry about this.
I found the problem. I wasn't splitting nodes at the start and end of paths. A fix should be in soon.
I use seqwish to align 17 sequences of varying length (169bp - 30082bp), but after the graph inducement the (path) length of sequence >5 changes from 30082bp to 30805bp, thus changing the sequence content.
minimap2 bubbles_sub.fasta bubbles_sub.fasta -c -X > bubbles_sub.paf
seqwish -p bubbles_sub.paf -s bubbles_sub.fasta -b bubbles_sub.graph -g bubbles_sub.gfa -r 21
Fasta Download