epruesse / SINA

SINA - Reference based multiple sequence alignment
https://sina.readthedocs.io
GNU General Public License v3.0
41 stars 4 forks source link

Failing to load db index #101

Open MPourjam opened 3 years ago

MPourjam commented 3 years ago

Although the index for database is built and exists in the same directory with the database file, Sina throws the error of 'Failed to load "path/to/index" - rebuilding'

epruesse commented 3 years ago

Odd. Can you elaborate?

Some conditions under which this would occur

MPourjam commented 3 years ago

I am using Sina in pipeline. It needs to get run through an iteration with the arguments below: sina-1.7 --in {} --search --meta-fmt csv --threads {} --lca-fields tax_slv --turn --search-max-result 100 --db {the absolute path to SILVA ARB SSURef99 } --out {The path to output file} --search-min-sim 0.9 --lca-quorum 0.7 --search-no-fast > /dev/null

Even after removing the build and letting the Sina creates it and store it, it still throws the error : "[Search (internal)] argument index 14:23:42 [Search (internal)] Failed to load "/path/to/db/SILVA-RefNR/SILVA138_SSURefNR99_120620.sidx" - rebuilding" The data files are regular FASTA files.

Thank you very much.

ilnamkang commented 3 years ago

I've experienced the same problem.

(1) I've downloaded "sina-1.7.2-linux.tar.gz" from this repository, decompressed it, and used the binary within "sina-1.7.2-linux/bin".

(2) My command was as below. I had used the same command for the previous run, except for input and output files.

sina -i input.fasta -r SILVA_138.1_SSURef_NR99_12_06_20_opt.arb -o output.fasta -o output.csv --turn --search --fs-full-len=1300 --lca-fields=tax_slv --show-conf --intype=fasta --preserve-order --fasta-write-dna --fasta-write-dots --csv-crlf --overhang=remove --lowercase=unaligned --insertion=forbid --pen-gapext=2 --calc-idty --fs-kmer-no-fast --search-iupac=pessimistic

(3) Free space in my disk is >20 TB, so I think that the index was not corrupted.

(4) I used the same source database and didn't changed the size of k.

Anyway, SINA worked fine after rebuilding the index, which required 15-20 min on my machine (Ubuntu 18.04).

Below, I attach the stdout of the run. ----- 19:21:54 [SINA] This is SINA 1.7.2. Effective parameters: add-relatives = 0 auto-filter-field = auto-filter-threshold = 0.8 calc-idty = 1 colors = 0 csv-crlf = 1 csv-id = name csv-sep = db = "SILVA_138.1_SSURef_NR99_12_06_20_opt.arb" debug-graph = 0 fasta-block = 0 fasta-idx = 0 fasta-write-dna = 1 fasta-write-dots = 1 filter = fs-cover-gene = 0 fs-engine = internal fs-full-len = 1300 fs-kmer-len = 10 fs-kmer-mm = 0 fs-kmer-no-fast = 1 fs-kmer-norel = 0 fs-leave-query-out = 0 fs-max = 40 fs-min = 40 fs-min-len = 150 fs-msc = 0.7 fs-msc-max = 2 fs-no-graph = 0 fs-oldmatch = 0 fs-req = 1 fs-req-full = 1 fs-req-gaps = 10 fs-weight = 1 gene-end = 0 gene-start = 0 in = "input.fasta" insertion = forbid intype = FASTA lca-fields = tax_slv lca-quorum = 0.7 line-length = 0 lowercase = unaligned markaligned = 0 markcopied = 0 match-score = 2 max-in-flight = 160 meta-fmt = none min-idty = 0 mismatch-score = -1 no-align = 0 num-pts = 80 out = "output.fasta" "output.csv" overhang = remove pen-gap = 5 pen-gapext = 2 prealigned = 0 preserve-order = 1 prot-level = 4 ptport = :/tmp/sina_pt_81559 realign = 0 search = 1 search-all = 0 search-copy-fields = search-correction = none search-cover = query search-filter-lowercase = 0 search-ignore-super = 0 search-iupac = pessimistic search-kmer-candidates = 1000 search-kmer-len = 10 search-kmer-mm = 0 search-kmer-norel = 0 search-max-result = 10 search-min-sim = 0.7 search-no-fast = 0 search-port = :/tmp/sina_pt2_81559 select-file = select-skip = 0 select-step = 1 show-conf = show-diff = 0 show-dist = 0 threads = 4294967295 turn = revcomp use-subst-matrix = 0 write-used-rels = 0

Processing: 0 [00:00:04] 19:21:58 [Search (internal)] Failed to load "SILVA_138.1_SSURef_NR99_12_06_20_opt.sidx" - rebuilding 19:31:20 [famfinder] Using internal engine for reference search Processing: 0 [00:09:26]██████████████████████████████████████████████████████████████████████████████████████| 1048576/1048576 [00:09:18 / 00:00:00] 19:31:20 [Search (internal)] Failed to load "SILVA_138.1_SSURef_NR99_12_06_20_opt.sidx" - rebuilding 19:39:58 [SINA] Aligner ready. Processing sequences 19:39:59 [SINA] Took 0.721s to align 24 sequences (33.2428 sequences/s) 19:39:59 [SINA] SINA finished. 19:39:59 [ARB I/O] Closing ARB database '"SILVA_138.1_SSURef_NR99_12_06_20_opt.arb"' ... -----

Lorcaserin commented 2 years ago

Hello,

I have a similar issue which seems to be linked to the --search-no-fast & --fs-kmer-no-fast parameters. I am trying to replicate online SILVA tool default parameters and when I use --search-no-fast or --fs-kmer-no-fast, my database (SSU Ref NR99) which worked with the parameter off, goes into rebuilding the index and after, I believe, yields a similar result (almost) to when the parameter is off (as I compared with the online SILVA tool's output) & needs to rebuild the index everytime one of those parameter is switched on.

Below is my command: (my input is a fasta file) sina -i $query -o $output -r SILVA_138.1_SSURef_NR99.arb --outtype=csv --search --search-db SILVA_138.1_SSURef_NR99.arb --lca-quorum=0.8 --min-idty=0.9 --lca-fields tax_slv --fields align_quality_slv,lca_tax_slv --preserve-order --show-conf --search-no-fast

Find below the stdout of the run


08:23:14 [SINA] This is SINA 1.7.2. Effective parameters: add-relatives = 0 auto-filter-field = auto-filter-threshold = 0.8 calc-idty = 0 colors = 0 csv-crlf = 0 csv-id = name csv-sep = db = "/Raw_data/SILVA_138.1_SSURef_NR99.arb" debug-graph = 0 fasta-block = 0 fasta-idx = 0 fasta-write-dna = 0 fasta-write-dots = 0 fields = align_quality_slv,lca_tax_slv filter = fs-cover-gene = 0 fs-engine = internal fs-full-len = 1400 fs-kmer-len = 10 fs-kmer-mm = 0 fs-kmer-no-fast = 0 fs-kmer-norel = 0 fs-leave-query-out = 0 fs-max = 40 fs-min = 40 fs-min-len = 150 fs-msc = 0.7 fs-msc-max = 2 fs-no-graph = 0 fs-oldmatch = 0 fs-req = 1 fs-req-full = 1 fs-req-gaps = 10 fs-weight = 1 gene-end = 0 gene-start = 0 in = "/Raw_data/input.fasta" insertion = shift intype = AUTO lca-fields = tax_slv lca-quorum = 0.8 line-length = 0 lowercase = none markaligned = 0 markcopied = 0 match-score = 2 max-in-flight = 20 meta-fmt = none min-idty = 0.9 mismatch-score = -1 no-align = 0 num-pts = 10 out = "/Results/result.csv" outtype = CSV overhang = attach pen-gap = 5 pen-gapext = 2 prealigned = 0 preserve-order = 1 prot-level = 4 ptport = :/tmp/sina_pt_188530 realign = 0 search = 1 search-all = 0 search-copy-fields = search-correction = none search-cover = query search-db = "/Raw_data/SILVA_138.1_SSURef_NR99.arb" search-filter-lowercase = 0 search-ignore-super = 0 search-iupac = optimistic search-kmer-candidates = 1000 search-kmer-len = 10 search-kmer-mm = 0 search-kmer-norel = 0 search-max-result = 10 search-min-sim = 0.7 search-no-fast = 1 search-port = :/tmp/sina_pt2_188530 select-file = select-skip = 0 select-step = 1 show-conf = show-diff = 0 show-dist = 0 threads = 4294967295 turn = none use-subst-matrix = 0 write-used-rels = 0

08:23:20 [famfinder] Using internal engine for reference search Processing: 0 [00:00:05]███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 510508/510508 [00:00:00 / 00:00:00] 08:23:20 [Search (internal)] Failed to load "/Raw_data/SILVA_138.1_SSURef_NR99.sidx" - rebuilding 08:29:10 [SINA] Aligner ready. Processing sequences 08:29:44 [SINA] Took 34.455s to align 1258 sequences (36.5108 sequences/s) 08:29:44 [SINA] SINA finished. 08:29:44 [ARB I/O] Closing ARB database '"/Raw_data/SILVA_138.1_SSURef_NR99.arb"' ...