Closed cwarden45 closed 4 years ago
FYI, the --decompress
download with the right name looks better. so far:
Connected to NCBI
Downloading nt (27 volumes) ...
Downloading nt.00.tar.gz... [OK]
Downloading nt.01.tar.gz... [OK]
Downloading nt.02.tar.gz... [OK]
Downloading nt.03.tar.gz... [OK]
Downloading nt.04.tar.gz... [OK]
Downloading nt.05.tar.gz... [OK]
Downloading nt.06.tar.gz... [OK]
Downloading nt.07.tar.gz... [OK]
Downloading nt.08.tar.gz... [OK]
Downloading nt.09.tar.gz... [OK]
Downloading nt.10.tar.gz... [OK]
Downloading nt.11.tar.gz... [OK]
Downloading nt.12.tar.gz... [OK]
Downloading nt.13.tar.gz... [OK]
Downloading nt.14.tar.gz... [OK]
Downloading nt.15.tar.gz... [OK]
Downloading nt.16.tar.gz... [OK]
Downloading nt.17.tar.gz... [OK]
Downloading nt.18.tar.gz... [OK]
Downloading nt.19.tar.gz... [OK]
Downloading nt.20.tar.gz... [OK]
Downloading nt.21.tar.gz... [OK]
Downloading nt.22.tar.gz... [OK]
Downloading nt.23.tar.gz... [OK]
Downloading nt.24.tar.gz... [OK]
Downloading nt.25.tar.gz... [OK]
Downloading nt.26.tar.gz... [OK]
Decompressing nt.00.tar.gz ... [OK]
Decompressing nt.01.tar.gz ... [OK]
Decompressing nt.02.tar.gz ... [OK]
Decompressing nt.03.tar.gz ... [OK]
Decompressing nt.04.tar.gz ... [OK]
Decompressing nt.05.tar.gz ... [OK]
Decompressing nt.06.tar.gz ... [OK]
Decompressing nt.07.tar.gz ... [OK]
Decompressing nt.08.tar.gz ... [OK]
Decompressing nt.09.tar.gz ... [OK]
Decompressing nt.10.tar.gz ... [OK]
Decompressing nt.11.tar.gz ... [OK]
Decompressing nt.12.tar.gz ... [OK]
Decompressing nt.13.tar.gz ... [OK]
Decompressing nt.14.tar.gz ... [OK]
Decompressing nt.15.tar.gz ... [OK]
Decompressing nt.16.tar.gz ... [OK]
Decompressing nt.17.tar.gz ... [OK]
Decompressing nt.18.tar.gz ... [OK]
Decompressing nt.19.tar.gz ... [OK]
Decompressing nt.20.tar.gz ... [OK]
Decompressing nt.21.tar.gz ... [OK]
Decompressing nt.22.tar.gz ... [OK]
Decompressing nt.23.tar.gz ... [OK]
Decompressing nt.24.tar.gz ... [OK]
Decompressing nt.25.tar.gz ... [OK]
Decompressing nt.26.tar.gz ... [OK]
If this strategy fixes the problem, then I will close the ticket.
If not, then I will provide additional information for troubleshooting.
I apologize for not catching this sooner.
Thank you again!
I am still encountering an issue, but I will most details on the main thread. Essentially, I am getting this error message that the nt database is not recognized:
Indexed BLAST database error: NCBI C++ Exception:
T0 "/opt/conda/conda-bld/blast_1595737360567/work/blast/c++/src/algo/blast/api/blast_dbindex.cpp", line 793: Error: (CDbIndex_Exception::bad index creation option) BLAST::ncbi::blast::CIndexedDb_New::CIndexedDb_New() - no database volume has an index
NCBI C++ Exception:
T0 "/opt/conda/conda-bld/blast_1595737360567/work/blast/c++/src/algo/blast/api/blast_dbindex.cpp", line 1006: Error: (CDbIndex_Exception::bad index creation option) BLAST::ncbi::blast::CIndexedDb_Old::CIndexedDb_Old() - no index file specified or index 'nt*' not found.
If I am correct that the BLAST configuration is causing the problem, then I will summarize the solution here.
However, in the meantime, I will close this ticket.
As a troubleshooting update, I tested using a pre-existing version of BLAST using export PATH=/opt/ncbi-blast-2.4.0+/bin:$PATH
.
However, that generates a different error message:
##########################################
## Sequence identification : BLAST ##
##########################################
pool8-oral-pathogen_S8_L001
pool6-oral-pathogen_S6_L001
pool1-skin-pathogen_S1_L001
pool4-skin-pathogen_S4_L001
BLAST Database error: Error: Not a valid version 4 database.
BLAST Database error: Error: Not a valid version 4 database.
BLAST Database error: Error: Not a valid version 4 database.
BLAST Database error: Error: Not a valid version 4 database.
pool5-skin-pathogen_S5_L001
pool7-oral-pathogen_S7_L001
pool2-skin-pathogen_S2_L001
pool3-skin-pathogen_S3_L001
BLAST Database error: Error: Not a valid version 4 database.
BLAST Database error: Error: Not a valid version 4 database.
BLAST Database error: Error: Not a valid version 4 database.
BLAST Database error: Error: Not a valid version 4 database.
Done
If the type of the database is being recognized, then that means it is being found successfully.
However, I still don't have a solution to get PVAmpliconFinder.sh working quite yet.
I saw this response in another discussion group, so I tested downloading the latest version of BLAST+.
I then modified the PATH to use that version with export PATH=/opt/ncbi-blast-2.10.0+/bin:$PATH
.
However, that gets me back to the earlier error message:
Indexed BLAST database error: NCBI C++ Exception:
T0 "/home/coremake/release_build/build/PrepareRelease_Linux64-Centos_JSID_01_260005_130.14.18.128_9008__PrepareRelease_Linux64-Centos_1575413971/c++/compilers/unix/../../src/algo/blast/api/blast_dbindex.cpp", line 793: Error: BLAST::ncbi::blast::CIndexedDb_New::CIndexedDb_New() - no database volume has an index
NCBI C++ Exception:
T0 "/home/coremake/release_build/build/PrepareRelease_Linux64-Centos_JSID_01_260005_130.14.18.128_9008__PrepareRelease_Linux64-Centos_1575413971/c++/compilers/unix/../../src/algo/blast/api/blast_dbindex.cpp", line 1006: Error: BLAST::ncbi::blast::CIndexedDb_Old::CIndexedDb_Old() - no index file specified or index 'nt*' not found.
As I partial response, I have downloaded the FASTA files and I am testing re-indexing those from scratch.
However, if I try to index the database within the Docker image that I created for the PVAmpliconFinder, then I get the following error message:
No volumes were created.
Error: mdb_env_open: Invalid argument
Based upon this discussion, I think there may be some space limitation.
So, I am going to try indexing the file outside of the Docker image and then use the same version of BLAST+ for PVAmpliconFinder. If that works, I will post to confirm.
To help others with troubleshooting, I found a solution to the most direct error message above (if I index the reference using another computer).
While the process was actually split between 2 computers, there are the commands that were used:
wget https://ftp.ncbi.nlm.nih.gov/blast/db/FASTA/nt.gz
gunzip nt.gz
mv nt nt.fa
export PATH=/opt/ncbi-blast-2.10.0+/bin:$PATH
makeblastdb -in nt.fa -dbtype nucl -out nt
However, I am still getting an error message from PVAmpliconFinder:
Indexed BLAST database error: NCBI C++ Exception:
T0 "/home/coremake/release_build/build/PrepareRelease_Linux64-Centos_JSID_01_330141_130.14.18.128_9008__PrepareRelease_Linux64-Centos_1589299866/c++/compilers/unix/../../src/algo/blast/api/blast_dbindex.cpp", line 793: Error: (CDbIndex_Exception::bad index creation option) BLAST::ncbi::blast::CIndexedDb_New::CIndexedDb_New() - no database volume has an index
NCBI C++ Exception:
T0 "/home/coremake/release_build/build/PrepareRelease_Linux64-Centos_JSID_01_330141_130.14.18.128_9008__PrepareRelease_Linux64-Centos_1589299866/c++/compilers/unix/../../src/algo/blast/api/blast_dbindex.cpp", line 1006: Error: (CDbIndex_Exception::bad index creation option) BLAST::ncbi::blast::CIndexedDb_Old::CIndexedDb_Old() - no index file specified or index 'nt*' not found.
As far as I can tell, the database files should all be there (even though there are 78 instead of 27 volumes):
nt.00.nhr
nt.00.nin
nt.00.nsq
nt.01.nhr
nt.01.nin
nt.01.nsq
nt.02.nhr
nt.02.nin
nt.02.nsq
nt.03.nhr
nt.03.nin
nt.03.nsq
nt.04.nhr
nt.04.nin
nt.04.nsq
nt.05.nhr
nt.05.nin
nt.05.nsq
nt.06.nhr
nt.06.nin
nt.06.nsq
nt.07.nhr
nt.07.nin
nt.07.nsq
nt.08.nhr
nt.08.nin
nt.08.nsq
nt.09.nhr
nt.09.nin
nt.09.nsq
nt.10.nhr
nt.10.nin
nt.10.nsq
nt.11.nhr
nt.11.nin
nt.11.nsq
nt.12.nhr
nt.12.nin
nt.12.nsq
nt.13.nhr
nt.13.nin
nt.13.nsq
nt.14.nhr
nt.14.nin
nt.14.nsq
nt.15.nhr
nt.15.nin
nt.15.nsq
nt.16.nhr
nt.16.nin
nt.16.nsq
nt.17.nhr
nt.17.nin
nt.17.nsq
nt.18.nhr
nt.18.nin
nt.18.nsq
nt.19.nhr
nt.19.nin
nt.19.nsq
nt.20.nhr
nt.20.nin
nt.20.nsq
nt.21.nhr
nt.21.nin
nt.21.nsq
nt.22.nhr
nt.22.nin
nt.22.nsq
nt.23.nhr
nt.23.nin
nt.23.nsq
nt.24.nhr
nt.24.nin
nt.24.nsq
nt.25.nhr
nt.25.nin
nt.25.nsq
nt.26.nhr
nt.26.nin
nt.26.nsq
nt.27.nhr
nt.27.nin
nt.27.nsq
nt.28.nhr
nt.28.nin
nt.28.nsq
nt.29.nhr
nt.29.nin
nt.29.nsq
nt.30.nhr
nt.30.nin
nt.30.nsq
nt.31.nhr
nt.31.nin
nt.31.nsq
nt.32.nhr
nt.32.nin
nt.32.nsq
nt.33.nhr
nt.33.nin
nt.33.nsq
nt.34.nhr
nt.34.nin
nt.34.nsq
nt.35.nhr
nt.35.nin
nt.35.nsq
nt.36.nhr
nt.36.nin
nt.36.nsq
nt.37.nhr
nt.37.nin
nt.37.nsq
nt.38.nhr
nt.38.nin
nt.38.nsq
nt.39.nhr
nt.39.nin
nt.39.nsq
nt.40.nhr
nt.40.nin
nt.40.nsq
nt.41.nhr
nt.41.nin
nt.41.nsq
nt.42.nhr
nt.42.nin
nt.42.nsq
nt.43.nhr
nt.43.nin
nt.43.nsq
nt.44.nhr
nt.44.nin
nt.44.nsq
nt.45.nhr
nt.45.nin
nt.45.nsq
nt.46.nhr
nt.46.nin
nt.46.nsq
nt.47.nhr
nt.47.nin
nt.47.nsq
nt.48.nhr
nt.48.nin
nt.48.nsq
nt.49.nhr
nt.49.nin
nt.49.nsq
nt.50.nhr
nt.50.nin
nt.50.nsq
nt.51.nhr
nt.51.nin
nt.51.nsq
nt.52.nhr
nt.52.nin
nt.52.nsq
nt.53.nhr
nt.53.nin
nt.53.nsq
nt.54.nhr
nt.54.nin
nt.54.nsq
nt.55.nhr
nt.55.nin
nt.55.nsq
nt.56.nhr
nt.56.nin
nt.56.nsq
nt.57.nhr
nt.57.nin
nt.57.nsq
nt.58.nhr
nt.58.nin
nt.58.nsq
nt.59.nhr
nt.59.nin
nt.59.nsq
nt.60.nhr
nt.60.nin
nt.60.nsq
nt.61.nhr
nt.61.nin
nt.61.nsq
nt.62.nhr
nt.62.nin
nt.62.nsq
nt.63.nhr
nt.63.nin
nt.63.nsq
nt.64.nhr
nt.64.nin
nt.64.nsq
nt.65.nhr
nt.65.nin
nt.65.nsq
nt.66.nhr
nt.66.nin
nt.66.nsq
nt.67.nhr
nt.67.nin
nt.67.nsq
nt.68.nhr
nt.68.nin
nt.68.nsq
nt.69.nhr
nt.69.nin
nt.69.nsq
nt.70.nhr
nt.70.nin
nt.70.nsq
nt.71.nhr
nt.71.nin
nt.71.nsq
nt.72.nhr
nt.72.nin
nt.72.nsq
nt.73.nhr
nt.73.nin
nt.73.nsq
nt.74.nhr
nt.74.nin
nt.74.nsq
nt.75.nhr
nt.75.nin
nt.75.nsq
nt.76.nhr
nt.76.nin
nt.76.nsq
nt.77.nhr
nt.77.nin
nt.77.nsq
nt.fa
nt.nal
nt.ndb
nt.not
nt.ntf
nt.nto
I noticed that export BLASTDB=/path/to/databases
doesn't work if I try to run this step on another computer. So, I am copying over the ~/.ncbirc
file on the server where I am running the analysis (with a modified path).
Also, I noticed that PVAmpliconFinder can pick up in the middle of analysis, but you need to go and delete the folder for whatever steps did not complete correctly (such as the blast_result
folder, if you are having problems at the step for BLAST analysis).
Since the BLAST indexing message was not clearly identifying the cause, I am testing running the BLAST step with 256M of RAM and 1 core (versus 16MB of RAM with 1-2 cores). However, I am still getting the same error message.
I looked into the PVAmpliconFinder code and I tested removing -use_index true
from the blastn
command.
The blastdb files are still present and and path is still being successfully added with the ~/.ncbirc
file, but that results in the following error message:
BLAST Database error: No alias or index file found for nucleotide database [nt] in search path [/path/to/PVAmpliconFinder/test_out/vsearch::/path/to/PVAmpliconFinder/PVAmpliconFinder/databases:]
I think that relates to this discussion, but the only thing that gave me to idea to try is to remove the ".fa" extension from the original sequence file (which I didn't think was used, and I avoided having .fa in all the other names by using the -out parameter for makeblastdb).
As I would have guessed, I still get the same error if I do that.
There was a likely contributing factor to the last troubleshooting attempt.
I had previously added an extra line to the ~/.ncbirc
file, but I had forgotten to update the path between different computers.
So, this is the configuration file (which didn't fix the problem in itself):
[BLAST]
BLASTDB=/path/to/PVAmpliconFinder/PVAmpliconFinder/databases
DATA_LOADERS=blastdb
I am currently not getting an error message, but blastn is taking >20 minutes for 1 sample.
I am also not sure how removing -use_index true
affects other steps, but I will provide an update (with either another error message or the successful result).
I think part of the issue may be that I need to create an additional index using makembindex
(as discussed here)
For example, I am testing the effect of adding this command:
makembindex -input nt -iformat blastdb
My VM froze if I tried to let blastn run without -use_index true
.
I am still working on troubleshooting, but I thought it might be good to mention a few updates:
1) I was able to successfully create a megabast indexed reference (with makembindex
), without an error messages. So, that is good.
2) I was previously able to download the regular BLAST index with update_blastdb.pl
. However, I am currently having difficulties with that (an error is occurring with the download of the 2nd volume). Currently, I am not certain if I need to do that (to use the smaller reference set with the taxonomy download).
3) If I run PVAmpliconFinder on the megablast indexed reference, then I get a different error message.
Also, this takes a long time to get to the point of generating that error message (~2 days for 1 sample).
I currently don't see that error message in the log file (after shutting down the VM and Docker image), but I know the time associated with the empty blast result for the 1st demo file.
So, I am going to see if using a cluster with more computational resources helps (as well as keep re-trying the update_blastdb.pl style of reference downloading, following by the additional index step, once that is successful). Even if I still have a problem, I will try to provide more specifics about the error message.
Some more updates:
2) I can re-download the smaller set of files with the taxonomy database. I am not sure if it was the main cause, but I think moving the files between folders may have affected some of the permissions.
3) If I use a different computer for the BLAST step, I can now get non-empty BLAST files. However, I think I need to use that alternative reference set with the taxonomy information:
Warning: [blastn] Taxonomy name lookup from taxid requires installation of taxdb database with ftp://ftp.ncbi.nlm.nih.gov/blast/db/taxdb.tar.gz
I think the decompressed version of the taxdb file is the same was what was downloaded from `update_blastdb.pl
in the other folder (with fewer nt volumes). However, I am going to test if that other set of re-downloaded files works. If so, I will provide some output and close the main ticket.
I think I have some additional questions for the "Advanced Analysis," but I think the above solution worked for the BLAST part (using the update_blastdb.pl
files, adding an extra megablast index, and running the BLAST step on a computer with more computational resources but where I don't have root privledges for installation).
I hope this can be helpful for others.
Hi Alexis,
This relates to issue #3, but I thought I should separate this out so that it would be easier for others to find the answer to this specific question.
I have created an
~/.ncbirc
file with the following information:Just to be extra safe, I also ran
export BLASTDB=/path/to/PVAmpliconFinder/databases
before running PVAmpliconFinder.Based upon changes in the error mesages, I believe the folder (PVAmpliconFinder/databases) is being successfully specified, but the issue is with finding the BLAST index files.
I am currently trying to download the nt files as follows:
I think the taxdb files were downloaded OK.
The combined output is as follows:
I previously tried downloading the files without
--decompress
and then extracting the .tar.gz files. However, PVAmpliconFinder still didn't find the BLAST nt reference files.I also see that there was a typo (nr instead of
nt
) with the extra parameter, so I am going to try that again. If that doesn't work, I can go back to downloading the compressed files (where I don't think I had a typo, but I encountered some sort of an issue). For example, I am seeing the expected number of volumse in the modified download (27, from nt.00.tar.gz to nt.26.tar.gz).Did you do something different when you set up your BLAST nt database?
Thank you very much.
Sincerely, Charles