bcgsc / LINKS

⛓ Long Interval Nucleotide K-mer Scaffolder
GNU General Public License v3.0
73 stars 15 forks source link

LINKS termineted with no error message when I used hybrid reads to scaffold. #12

Closed YiweiNiu closed 7 years ago

YiweiNiu commented 7 years ago

Hi Warren,

I'm using LINKS v1.8.5 to scaffold the contigs produced by MEGAHIT. I used hybrid reads including PE, MPE, and PacBio long-reads. The PE and MPE reads were feed to LINKS according to the manual. But LINKS stopped with no error message. The following is the log file:

Running: /home/software/links_v1.8.5/LINKS [v1.8.5]
-f /megahit_out/final.contigs.fa
-s PE.MPE.Pacbio.fof
-m 
-d 4000
-k 15
-e 0.1
-l 5
-a 0.3
-t 2
-o 0
-z 500
-b MEGAHIT.scaffold.LINKS
-r 
-p 0.001
-x 0

----------------- Verifying files -----------------

Checking 270B_R1.fasta_paired.fa...ok
Checking 500B_R1.fasta_paired.fa...ok
Checking 800B_R1.fasta_paired.fa...ok
Checking 3k_1_R1.fasta_paired.fa...ok
Checking 5k-1_R1.fasta_paired.fa...ok
Checking 5k-2_R1.fasta_paired.fa...ok
Checking 10k_R1.fasta_paired.fa...ok
Checking av_20k.fasta...ok
Checking sequence target file /megahit_out/final.contigs.fa...ok

=>Reading contig/sequence assembly file: 2017年 07月 15日 星期六 20:54:23 CST
Building a Bloom filter using 15-mers derived from sequences in -f /megahit_out/final.contigs.fa...
*****
Bloom filter specs
elements=810918104
FPR=0.001
size (bits)=11659046080
hash functions=9
*****
Contigs (>= 500 bp) processed k=15:
392101

=>Writing Bloom filter to disk (MEGAHIT.scaffold.LINKS.bloom): 2017年 07月 15日 星期六 21:00:30 CST

=>Reading long reads, building hash table: 2017年 07月 15日 星期六 21:00:32 CST
Reads processed k=15, dist=4000, offset=0 nt, sliding step=2 nt:

Reads processed from file 1/8, 270B_R1.fasta_paired.fa:
146753470

Reads processed from file 2/8, 500B_R1.fasta_paired.fa:
228144635

Reads processed from file 3/8, 800B_R1.fasta_paired.fa:
280845230

Reads processed from file 4/8, 3k_1_R1.fasta_paired.fa:
280845396

Reads processed from file 5/8, 5k-1_R1.fasta_paired.fa:
380208084

Reads processed from file 6/8, 5k-2_R1.fasta_paired.fa:
380208494

Reads processed from file 7/8, 10k_R1.fasta_paired.fa:
438639581

Reads processed from file 8/8, av_20k.fasta:

When it started to process the Pacbio long-reads, the program stopped. I've tried twice and got the same result.

Any suggestions would be appreciated.

Yiwei Niu

warrenlr commented 7 years ago

I think you are running out of memory..Looks like you have a lots of pacbio reads and it is extracting too many kmer pairs. Consider increasing -t to 200

have you tried the tests included in the distribution? do they work as expected?

Also, I recommend running LINKS iteratively (see recipes for spruce and A. thaliana, provided)

ftp://ftp.bcgsc.ca/supplementary/LINKS/GS/LINKSrecipe_athaliana_raw.sh

Please read the documentation fully, to guide you in choosing -d and -t Also, you may increase -k to 21 while working with pacbio reads.

Finally, to test that LINKS is working, do a small test with a single pacbio file and monitor your memory usage. Good luck Rene

YiweiNiu commented 7 years ago

Hi Rene,

Thanks for your reply! I tried -t 20 with different -d, and got a satisfactory results.

Best, Yiwei Niu