ekg / seqwish

alignment to variation graph inducer
MIT License
143 stars 18 forks source link

seqwish crashes on chrom21 for 3 hgsvc samples #18

Open glennhickey opened 5 years ago

glennhickey commented 5 years ago

This ran through fine on one sample (HG00514), but when I scaled up to 3 it crashed. The input sequences can be found here:

https://transfer.sh/SZ5pU/hgsvc-chr21-seqs.tar.gz

# runs in 40min
./pan-minimap2 hg38_chr21.fa HG00514_chr21_0.fa HG00514_chr21_1.fa HG00733_chr21_0.fa HG00733_chr21_1.fa NA19240_chr21_0.fa NA19240_chr21_1.fa | fpa drop -l 10000 > hgsvc_seqwish_fpa10000.paf

# (hgsvc_chr21.fa is the above sequences catted together with hg38 first)
seqwish -s hgsvc_chr21.fa -p hgsvc_seqwish_fpa10000.paf -t 16 -b work/x -g hgsvc_seqwish_fpa10000.gfa

# crashes after 7.5 hours
seqwish: /ebs1/seqwish/src/links.cpp:23: void seqwish::derive_links(seqwish::seqindex_t&, size_t, m\
mmulti::map<long unsigned int, long unsigned int>&, mmmulti::map<long unsigned int, long unsigned i\
nt>&, mmmulti::map<long unsigned int, long unsigned int>&): Assertion `v1.size() == v2.size() == 1'\
 failed.
Command terminated by signal 6

Is it possible that 126G of RAM is not enough?

ekg commented 5 years ago

It should be more than enough. I wonder if you ran out of disk space though? This is the test case you sent?

On Mon, Jul 8, 2019, 15:23 Glenn Hickey notifications@github.com wrote:

This ran through fine on one sample (HG00514), but when I scaled up to 3 it crashed. The input sequences can be found here:

https://transfer.sh/SZ5pU/hgsvc-chr21-seqs.tar.gz

runs in 40min

./pan-minimap2 hg38_chr21.fa HG00514_chr21_0.fa HG00514_chr21_1.fa HG00733_chr21_0.fa HG00733_chr21_1.fa NA19240_chr21_0.fa NA19240_chr21_1.fa | fpa drop -l 1000 > hgsvc_seqwish_fpa10000.paf

(hgsvc_chr21.fa is the above sequences catted together with hg38 first)

seqwish -s hgsvc_chr21.fa -p hgsvc_seqwish_fpa10000.paf -t 16 -b work/x -g hgsvc_seqwish_fpa10000.gfa

crashes after 7.5 hours

seqwish: /ebs1/seqwish/src/links.cpp:23: void seqwish::derive_links(seqwish::seqindex_t&, size_t, m\ mmulti::map<long unsigned int, long unsigned int>&, mmmulti::map<long unsigned int, long unsigned i\ nt>&, mmmulti::map<long unsigned int, long unsigned int>&): Assertion `v1.size() == v2.size() == 1'\ failed. Command terminated by signal 6

Is it possible that 126G of RAM is not enough?

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/ekg/seqwish/issues/18?email_source=notifications&email_token=AABDQEOFAZWDTGPA5E5VSILP6M5TPA5CNFSM4H63U6T2YY3PNVWWK3TUL52HS4DFUVEXG43VMWVGG33NNVSW45C7NFSM4G53HNHQ, or mute the thread https://github.com/notifications/unsubscribe-auth/AABDQEPMQ2RYY4ZSHEQDH5LP6M5TPANCNFSM4H63U6TQ .

ekg commented 5 years ago

You did the name prefixing awk thing to make sure the sequences are all uniquely named?

On Mon, Jul 8, 2019, 15:25 Erik Garrison erik.garrison@gmail.com wrote:

It should be more than enough. I wonder if you ran out of disk space though? This is the test case you sent?

On Mon, Jul 8, 2019, 15:23 Glenn Hickey notifications@github.com wrote:

This ran through fine on one sample (HG00514), but when I scaled up to 3 it crashed. The input sequences can be found here:

https://transfer.sh/SZ5pU/hgsvc-chr21-seqs.tar.gz

runs in 40min

./pan-minimap2 hg38_chr21.fa HG00514_chr21_0.fa HG00514_chr21_1.fa HG00733_chr21_0.fa HG00733_chr21_1.fa NA19240_chr21_0.fa NA19240_chr21_1.fa | fpa drop -l 1000 > hgsvc_seqwish_fpa10000.paf

(hgsvc_chr21.fa is the above sequences catted together with hg38 first)

seqwish -s hgsvc_chr21.fa -p hgsvc_seqwish_fpa10000.paf -t 16 -b work/x -g hgsvc_seqwish_fpa10000.gfa

crashes after 7.5 hours

seqwish: /ebs1/seqwish/src/links.cpp:23: void seqwish::derive_links(seqwish::seqindex_t&, size_t, m\ mmulti::map<long unsigned int, long unsigned int>&, mmmulti::map<long unsigned int, long unsigned i\ nt>&, mmmulti::map<long unsigned int, long unsigned int>&): Assertion `v1.size() == v2.size() == 1'\ failed. Command terminated by signal 6

Is it possible that 126G of RAM is not enough?

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/ekg/seqwish/issues/18?email_source=notifications&email_token=AABDQEOFAZWDTGPA5E5VSILP6M5TPA5CNFSM4H63U6T2YY3PNVWWK3TUL52HS4DFUVEXG43VMWVGG33NNVSW45C7NFSM4G53HNHQ, or mute the thread https://github.com/notifications/unsubscribe-auth/AABDQEPMQ2RYY4ZSHEQDH5LP6M5TPANCNFSM4H63U6TQ .

glennhickey commented 5 years ago

The test case I sent the other day was just one sample (hg38 + 2 sequences). This one (I put a new link to the data above) contains those, plus another 4 sequences. I'm working on a disk with 1.6T free space.

I don't do any particular awking, but my sequences have unique names

grep '>' *.fa
HG00514_chr21_0.fa:>HG00514_chr21_0_0
HG00514_chr21_0.fa:>HG00514_chr21_0_1
HG00514_chr21_0.fa:>HG00514_chr21_0_2
HG00514_chr21_1.fa:>HG00514_chr21_1_0
HG00514_chr21_1.fa:>HG00514_chr21_1_1
HG00733_chr21_0.fa:>HG00733_chr21_0_0
HG00733_chr21_1.fa:>HG00733_chr21_1_0
hg38_chr21.fa:>chr21
NA19240_chr21_0.fa:>NA19240_chr21_0_0
NA19240_chr21_1.fa:>NA19240_chr21_1_0
ekg commented 5 years ago

@glennhickey I'm not sure that the fasta reader is going to be OK with the sequences named that way. But I can't be sure that this is the problem. I'll see if I can reproduce with a simpler test.