bbuchfink / diamond

Accelerated BLAST compatible local sequence aligner.
GNU General Public License v3.0
1k stars 183 forks source link

Support for BLAST databases #439

Open bbuchfink opened 3 years ago

bbuchfink commented 3 years ago

Hey @tillea @mr-c pinging you since I'm about to release a new feature for Diamond to directly read BLAST databases. I'm doing this by linking against the shared libraries from NCBI, all of which are contained in the ncbi-blast+ debian package. However, the header files needed for compilation are not contained in any debian package.

My current procedure is to download the source tarball from https://ftp.ncbi.nlm.nih.gov/blast/executables/blast+/ and run configure and make install to get the headers. Needless to say, this is cumbersome, especially since you also need to go through the BLAST build process to get usable headers.

So, it would be great if these headers could be included in a debian package. Appreciate anything you can do.

mr-c commented 3 years ago

Hey @bbuchfink ; thanks for letting us know. Can you file a bug against the source package ncbi-blast+ ? https://bugs.debian.org/cgi-bin/pkgreport.cgi?archive=0;dist=unstable;ordering=normal;repeatmerged=0;src=ncbi-blast%2B

bbuchfink commented 3 years ago

done!

sjaenick commented 3 years ago

Can you comment on performance when using BLAST databases instead of .dmnd?

bbuchfink commented 3 years ago

Loading in the database sequences may still take 10%-20% longer when using a BLAST db, but the overall impact on performance should be minimal.

FredericBGA commented 3 years ago

Hello I failed when trying to compile Diamond (my os is quite old, I can not use the binaries version 'GLIBC_2.17' not found...) So I managed to install 2.0.8 using Conda but: Error: This executable was not compiled with support for BLAST databases. Would it be possible to have a Conda version that is ready for the BLAST databases support? Thank you.

bbuchfink commented 3 years ago

Hello I failed when trying to compile Diamond (my os is quite old, I can not use the binaries version 'GLIBC_2.17' not found...) So I managed to install 2.0.8 using Conda but: Error: This executable was not compiled with support for BLAST databases. Would it be possible to have a Conda version that is ready for the BLAST databases support? Thank you.

It is planned but could still take some time. Can you tell me what error you are getting when compiling from source? It should be possible to fix this.

FredericBGA commented 3 years ago

Here is a gist file with my compilation issues: Diamond 2.0.8 compilation issues. (CentOS release 6.6 (Final))

I hope you will be able to help me. Otherwise I will wait for the conda version. Thank you.

bbuchfink commented 3 years ago

Here is a gist file with my compilation issues: Diamond 2.0.8 compilation issues. (CentOS release 6.6 (Final))

I hope you will be able to help me. Otherwise I will wait for the conda version. Thank you.

I don't really understand the cause of this error, there seems to be some problem with the compiler setup on your system. One thing you could try is to compile with a custom GCC as described here: https://github.com/bbuchfink/diamond/wiki/5.-Advanced-topics#compiling-with-custom-gcc

FredericBGA commented 3 years ago

Thank you for your help. I've installed GCC 10.2.0 without having seen any errors. But the compilation still fails with the same type of errors. So either I've an issue with the OS (as version maybe now? as gcchas been upgraded) or either I miss something obvious. https://gist.github.com/FredericBGA/c696199937b6121959924ac040008d00 I will wait for the Conda version.

jjkoehorst commented 3 years ago

I used a SEED fasta file and turned it into a blast database and while running it seems to look ok but after a while it encounters an error...

Loading query sequences...  [15.627s]
Masking queries...  [6.019s]
Building query seed set...  [0s]
Building query histograms...  [1.424s]
Allocating buffers...  [0s]
Loading reference sequences...  [0s]
Error: NCBI C++ Exception:
    T0 "/root/ncbi-blast-2.11.0+-src/c++/src/objtools/blast/seqdb_reader/seqdbimpl.cpp", line 843: Error: (CSeqDBException::eArgErr) BLASTDB::ncbi::CSeqDBImpl::GetSeqIDs() - OID not found

Any clue what could be happening here? Is it the input fasta file that needs to conform to some format?

bbuchfink commented 3 years ago

Did you just run makeblastdb on a fasta file or something else? It may not work yet for aliased databases.

jjkoehorst commented 3 years ago

The following command was used:

ncbi-blast-2.11.0+/bin/makeblastdb -dbtype prot -in seed_subsystems_db.fa -title SEED_subsystems

bbuchfink commented 3 years ago

It works for me when using a BLAST db created by makeblastdb. This error likely means the database has a sparse OID range for some reason, like alias databases do. I should be able to provide a fix for this shortly.

Lix1993 commented 3 years ago

hello @jjkoehorst

Could you tell me where can I download the seed_subsystems_db.fa since ftp.theseed.org is not accessible

bbuchfink commented 3 years ago

I used a SEED fasta file and turned it into a blast database and while running it seems to look ok but after a while it encounters an error...

Loading query sequences...  [15.627s]
Masking queries...  [6.019s]
Building query seed set...  [0s]
Building query histograms...  [1.424s]
Allocating buffers...  [0s]
Loading reference sequences...  [0s]
Error: NCBI C++ Exception:
    T0 "/root/ncbi-blast-2.11.0+-src/c++/src/objtools/blast/seqdb_reader/seqdbimpl.cpp", line 843: Error: (CSeqDBException::eArgErr) BLASTDB::ncbi::CSeqDBImpl::GetSeqIDs() - OID not found

Any clue what could be happening here? Is it the input fasta file that needs to conform to some format?

@jjkoehorst I think this should be fixed in the 2.0.9 release now.

bartns commented 3 years ago

I can confirm that with the update it runs without an error. Thanks a lot for fixing this so fast!

@Lix1993 It is indeed true that ftp.theseed.org has been unavailable for quite some time unfortunately. I have contacted seed but no response.... A copy can found here: (although it is not very recent) https://bioshare.bioinformatics.ucdavis.edu/bioshare/download/2c8s521xj9907hn/subsys_db.fa

(It comes from the samsa2 pipeline https://github.com/transcript/samsa2)

Lix1993 commented 3 years ago

thanks

bart. @.***> 于 2021年4月15日周四 下午6:40写道:

I can confirm that with the update it runs without an error. Thanks a lot for fixing this so fast!

@Lix1993 https://github.com/Lix1993 It is indeed true that ftp.theseed.org has been unavailable for quite some time unfortunately. I have contacted seed but no response.... A copy can found here: (although it is not very recent)

https://bioshare.bioinformatics.ucdavis.edu/bioshare/download/2c8s521xj9907hn/subsys_db.fa

(It comes from the samsa2 pipeline https://github.com/transcript/samsa2)

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/bbuchfink/diamond/issues/439#issuecomment-820323750, or unsubscribe https://github.com/notifications/unsubscribe-auth/AC77DX2TZ6DUXFAVAWLLVALTI27CBANCNFSM4Y3DX7ZA .

nwheeler443 commented 3 years ago

Hi, I'm having trouble with this feature. I tried the workflow in the Wiki:

downloading and using a BLAST database

update_blastdb.pl --decompress --blastdb_version 5 swissprot ./diamond blastp -d swissprot -q queries.fasta -o matches.tsv

I tried this with a bioconda install of version 2.0.9 as well as an installation straight from the Github source and in both instances got "Error: This executable was not compiled with support for BLAST databases."

Is the conclusion from the above that you have to make the blastdb yourself? Or have I missed something?

bbuchfink commented 3 years ago

The conda version does not yet support this, you can download the prebuilt binary: http://github.com/bbuchfink/diamond/releases/download/v2.0.9/diamond-linux64.tar.gz

When compiling from source, some additional steps need to be taken to enable blast db support (see the installation page).

nwheeler443 commented 3 years ago

Ahh great, that seems to be working! Thanks!

FredericBGA commented 3 years ago

Hi, a quick reply to say that I managed to compile diamond. My issue was related to a lack of binutils tools.

bbuchfink commented 3 years ago

Since v2.0.10, using a BLAST database now requires a diamond prepdb call. But in return, performance for loading in the database has been largely improved, and using a BLAST database should now be substantially faster than using a .dmnd file.

JamesWZM commented 2 years ago

I got "No alias or index file found for protein database" when I use prepdb for pre-formatted nr blastp database. What's wrong with that?

lmolokin commented 2 years ago

I'm also getting the no alias/index error when calling diamond prepdb -d nr on my newly downloaded nr db:

Error: NCBI C++ Exception: T0 "/root/ncbi-blast-2.11.0+-src/c++/src/objtools/blast/seqdb_reader/seqdbalias.cpp", line 320: Error: (CSeqDBException::eFileErr) BLASTDB::ncbi::CSeqDBAliasNode::x_ResolveNames() - No alias or index file found for protein database [nr.fa] in search path

bbuchfink commented 2 years ago

I'm also getting the no alias/index error when calling diamond prepdb -d nr on my newly downloaded nr db:

Error: NCBI C++ Exception: T0 "/root/ncbi-blast-2.11.0+-src/c++/src/objtools/blast/seqdb_reader/seqdbalias.cpp", line 320: Error: (CSeqDBException::eFileErr) BLASTDB::ncbi::CSeqDBAliasNode::x_ResolveNames() - No alias or index file found for protein database [nr.fa] in search path

You are probably not specifying the correct path. Use the directory where you downloaded the files + /nr without extensions.

bbuchfink commented 5 months ago

You need to download a version that was compiled with BLAST db support, e.g. here: https://github.com/bbuchfink/diamond/releases

Am Do., 8. Feb. 2024 um 16:48 Uhr schrieb vdnadung @.***

:

Hello @bbuchfink https://github.com/bbuchfink,

I also have the error message Error: This executable was not compiled with support for BLAST databases. Here is what I did (using a HPC):

  • download nr.*.tar.gz files, save in nr folder
  • tar -xf .nr.*.tar.gz files
  • module load DIAMOND/2.1.8-GCC-12.2.0
  • diamond prepdb -d nr/nr
  • Error: This executable was not compiled with support for BLAST databases. Could you please help? Thank you in advance. Best wishes Dung

— Reply to this email directly, view it on GitHub https://github.com/bbuchfink/diamond/issues/439#issuecomment-1934411855, or unsubscribe https://github.com/notifications/unsubscribe-auth/ACMJXOUTC5X5Q5BJWU2TTSTYSTXVXAVCNFSM4Y3DX7ZKU5DIOJSWCZC7NNSXTN2JONZXKZKDN5WW2ZLOOQ5TCOJTGQ2DCMJYGU2Q . You are receiving this because you were mentioned.Message ID: @.***>

vdnadung commented 5 months ago

Hi @bbuchfink,

I used DIAMOND/2.1.8 so I don't know why it happened.

Best wishes Dung