bcgsc / LINKS

⛓ Long Interval Nucleotide K-mer Scaffolder
GNU General Public License v3.0
72 stars 15 forks source link

LINKS is not finding any kmer pairs in my ONT reads #26

Closed milw closed 6 years ago

milw commented 6 years ago

Hi I've tried this with default settings and some variations on -d and -t, but no luck- Links does not find any kmer pairs! My read avg length is about 10kb and N50 25kb; here's the stdout for current run with default settings: `Contigs (>= 500 bp) processed k=15: 248522

=>Writing Bloom filter to disk (links-default/links-def.bloom) : Mon Jul 23 14:43:31 CDT 2018 Storing filter. Filter is 1303720144bytes. Writting header... magic: BlOOMFXX hlen: 72 size: 10429761152 nhash: 9 kmer: 15 dFPR: 0 aFPR: 0 rFPR: 0 nEntry: 0 tEntry: 0

=>Reading long reads, building hash table : Mon Jul 23 14:43:32 CDT 2018 Reads processed k=15, dist=4000, offset=0 nt, sliding step=2 nt:

Reads processed from file 1/1, /media/bigdata2/zeh/nanopore_all_cell1.fastq: 341656 Extracted 0 15-mer pairs at -d 4000, from all 341656 sequences provided in /media/bigdata2/zeh/reads1.txt

Extracted 0 15-mer pairs overall. This is the set that will be used for scaffolding`

warrenlr commented 6 years ago

what is the N50 of your reads? This could happen if: -ONT are very short (< -d) -The format isn't FASTA/FASTQ -There is no kmer match between the draft assembly and the ONT

I doubt it is the latter, but could happen if very very high ONT base error or species mis-match

milw commented 6 years ago

It appears to be the fastq format- I extracted the same reads as just fasta, and now get a more appropriate number of pairs extracted: Reads processed from file 1/1, /media/bigdata2/zeh/nanopore_all_cell1.fasta: 85414 Extracted 248243759 15-mer pairs at -d 4000, from all 85414 sequences provided in /media/bigdata2/zeh/reads1.txt

warrenlr commented 6 years ago

great, problem solved?

milw commented 6 years ago

Solved! Sorry so long to answer!