epruesse / SINA

SINA - Reference based multiple sequence alignment
https://sina.readthedocs.io
GNU General Public License v3.0
41 stars 4 forks source link

Detect ARB database type (fail on pt server internal type) #73

Closed gdauria closed 4 years ago

gdauria commented 5 years ago

Dear Sir/Mrs, I have an issue on using sina-1.6.0. I do not know if this is the right place but I started an arb_pt_server and trying to assign a whole gene 16S sequence I have back always the same error: 21:01:28 [align] Internal error - incomplete data for alignment

./sina -i test.fasta -o test.sina --ptport=:/home/user/tmp/pt_ssu_1 --db /home/user/ramDisk/SSURef_NR99_123_SILVA_12_07_15_opt.arb --search --search-db /home/user/ramDisk/SSURef_NR99_123_SILVA_12_07_15_opt.arb --search-port=:/home/user/tmp/pt_ssu_1 
21:01:26 [SINA] This is SINA 1.6.0.
21:01:27 [SINA] Aligner ready. Processing sequences
21:01:28 [align] Internal error - incomplete data for alignment
21:01:28 [SINA] Took 1.407s to align 1 sequences (0.710342 sequences/s)
21:01:28 [SINA] SINA finished.
Processing: 100% |--------------------------------------| 1/1 [00:00:01 / 00:00:00]
21:01:28 [ARB I/O] Closing ARB database '"/home/user/ramDisk/SSURef_NR99_123_SILVA_12_07_15_opt.arb"' ...

I started the arb_pt_server from the sina-1.6.0 folder using following command: bin/arb_pt_server -D/home/user/ramDisk/SSURef_NR99_123_SILVA_12_07_15_opt.arb -T:/home/user/tmp/pt_ssu_1

Everything seems fine having the following last lines:

Building PT-Server for alignment 'ali_16s'...
Database contains 597607 species
Progress: Checking data
...................................................................... [100.0%] used: 0s
[done]
- mapping ptindex ('/home/gdauria/ramDisk/SSURef_NR99_123_SILVA_12_07_15_opt.arb.pt', 1.85 Gb) from disk
[startup took 24m43s]
ok, server is running.

I do not have any clue about what is wrong.

Thank you for your help

Giuseppe

epruesse commented 5 years ago

Hi @gdauria!

Thanks, this does look like a regression. I will have a look into it. Running with an externally launched PT server wasn't on my list for testing and may have a bug.

Is there a reason you want/need to do it that way? The 1.6.0 comes with an internal replacement for the PT server, so you can just run this:

./sina -i test.fasta -o test.sina --db /home/user/ramDisk/SSURef_NR99_123_SILVA_12_07_15_opt.arb --search 

Or, you can just allow SINA to launch and terminate the PT server internally by not specifying the --pt-port and --search-port options.

I'd recommend the internal PT server replacement. It's much faster, has equivalent accuracy and you don't have to bother with launching things separately. The PT server option is in there mostly so you can compare results between the old and new approaches still having the old mechanics available. It also has a few more options, but you aren't using any of those (and in my experience they don't help anyway).

gdauria commented 5 years ago

Thank you Elmar for your answer, the original idea was to update my previous installation with an older sina version. This last version of sina is really fast. I do not need any arb_pt_server running in background. Thank you again to show me how it works and thank you for the great work you did. bests

Giuseppe

On Wed, May 15, 2019 at 9:46 PM Elmar Pruesse notifications@github.com wrote:

Hi @gdauria https://github.com/gdauria!

Thanks, this does look like a regression. I will have a look into it. Running with an externally launched PT server wasn't on my list for testing and may have a bug.

Is there a reason you want/need to do it that way? The 1.6.0 comes with an internal replacement for the PT server, so you can just run this:

./sina -i test.fasta -o test.sina --db /home/user/ramDisk/SSURef_NR99_123_SILVA_12_07_15_opt.arb --search

Or, you can just allow SINA to launch and terminate the PT server internally by not specifying the --pt-port and --search-port options.

I'd recommend the internal PT server replacement. It's much faster, has equivalent accuracy and you don't have to bother with launching things separately. The PT server option is in there mostly so you can compare results between the old and new approaches still having the old mechanics available. It also has a few more options, but you aren't using any of those (and in my experience they don't help anyway).

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/epruesse/SINA/issues/73?email_source=notifications&email_token=ABL27ZVT4SXWWCUKJSY7RF3PVRSCJA5CNFSM4HNGEQO2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGODVPX4JI#issuecomment-492797477, or mute the thread https://github.com/notifications/unsubscribe-auth/ABL27ZUAHEXUFYVAAZAEH4TPVRSCJANCNFSM4HNGEQOQ .

epruesse commented 4 years ago

Ok - I figured out what the cause of this was.

The arb_pt_server will (in ARB >6) compress the ARB database it is given, so by running arb_pt_server -Dmydb.arb -T:some.sock, mydb.arb got turned into something that isn't a normal ARB database. And then SINA failed to work with that.

=> Detect if an ARB database is actually legit.