bcgsc / LINKS

⛓ Long Interval Nucleotide K-mer Scaffolder
GNU General Public License v3.0
73 stars 15 forks source link

Attempt to free unreferenced scalar #21

Closed jdmontenegro closed 5 years ago

jdmontenegro commented 6 years ago

Hi guys,

I am really interested in using this tool for scaffolding with 10X chromium reads. I have cloned this git and installed the bloom filter with swig/3.0.12 with no problem. There are a few differences between the actual directory structure and the one assumed in the instructions. But anyway, inside the LINKS directory there is a bin directory with LINK in it. I created a lib directory inside and cloned the bloomfilter module in it. The test run correctly:

$ ./writeBloom_rolling.pl -f test.fasta

Running:./writeBloom_rolling.pl -f test.fasta -k 15 -p 0.0001

Checking sequence target file test.fasta...ok
Wed Jun 27 16:09:47 AEST 2018:Estimating number of elements from file size
*****
Bloom filter specs
elements=58086
FPR=0.0001
size (bits)=1113536
hash functions=13
*****
Wed Jun 27 16:09:47 AEST 2018:Shredding supplied sequence file (-f test.fasta) into 15-mers..
Contigs processed k=15:
35
Wed Jun 27 16:09:47 AEST 2018:Writing Bloom filter to disk (test.fasta_k15_p0.0001_rolling.bf)
Storing filter. Filter is 139192 bytes.

Wed Jun 27 16:09:47 AEST 2018:./writeBloom_rolling.pl executed normally

but when running the real case I get the following error:

./LINKS -f contigs_ge500.fasta -s empty.fof -b contigs_ge500.fasta.scaff_s98_c5_l0_d0_e15000_r0.05_original.tigpair_checkpoint

Running: ./LINKS [v1.8.6]
-f contigs_ge500.fasta
-s empty.fof
-m
-d 4000
-k 15
-e 0.1
-l 5
-a 0.3
-t 2
-o 0
-z 500
-b contigs_ge500.fasta.scaff_s98_c5_l0_d0_e15000_r0.05_original.tigpair_checkpoint
-r
-p 0.001
-x 0

----------------- Verifying files -----------------

Checking sequence target file contigs_ge500.fasta...ok

=>Reading contig/sequence assembly file : Wed Jun 27 16:10:33 AEST 2018
Building a Bloom filter using 15-mers derived from sequences in -f contigs_ge500.fasta...
Attempt to free unreferenced scalar: SV 0xd76fb8, Perl interpreter: 0xd55010 at /data/Bioinfo/bioinfo-proj-jmontenegro/Programs/LINKS/bin/./lib/bloomfilter/swig/BloomFilter.pm line 118.
*****
Bloom filter specs
elements=2913009959
FPR=0.001
size (bits)=41882055808
hash functions=9
*****
Contigs (>= 500 bp) processed k=15:
1
Something went wrong running ./LINKS Wed Jun 27 16:10:38 AEST 2018
RuntimeError Usage: insertSeq(bloom,seq,numHashes,k); at ./LINKS line 807, <IN> line 20.

So it seems something is broken. I have no idea what could be going wrong here. Could you help me sort this out?

Kind regards,

warrenlr commented 6 years ago

hmm never seen this error before. @JustinChu : any idea?

Some things you can try:

Running ./writeBloom_rolling.pl -f contigs_ge500.fasta -k 15 -p 0.0001 Does this work? If it does, try passing the resulting bloom filter (.bloom) to LINKS via the -r option

***Keep in mind that the Bloom filter functionality is NOT necessary for running LINKS with either ARCS or ARKS. You can turn off the Bloom filter with -x 1

Also, all the checks will be skipped if a checkpoint for your run exists in your directory:

-b contigs_ge500.fasta.scaff_s98_c5_l0_d0_e15000_r0.05_original

(Only specify the base name, not the full checkpoint file, LINKS will look for the extension it needs)

JustinChu commented 6 years ago

I'm not sure. This issue is perl basically saying we trying to free an object twice or something to that effect. Worst case it is some memory corruption in the c++ Bloom Filter code. I'd probably need a way to reproduce the error locally in order to fix it, however.

Note that -p is set to 0.0001 in the test, but it seems LINKS is doing -p 0.001 by default.

Test both

 ./writeBloom_rolling.pl -f contigs_ge500.fasta -k 15 -p 0.0001
 ./writeBloom_rolling.pl -f contigs_ge500.fasta -k 15 -p 0.001
jdmontenegro commented 6 years ago

Interestingly enough, increasing memory available in the scheduler did the trick. I am using 150Gb now and it seems to be working correctly. It just did not produce any scaffolds. I think that is because I was using the incorrect extension of the checkpoint file: ${base}_original.tigpair_checkpoint instead of ${base}_original.tigpair_checkpoint.tsv I am running it again and check out the results.

Cheers,