alexpiper / taxreturn

An R package for creating taxonomic reference databases for metabarcoding studies
GNU General Public License v3.0
8 stars 1 forks source link

blast()/blast_assign_species() #26

Open gbchauhan opened 2 years ago

gbchauhan commented 2 years ago

Hi Alex, I was searching for a package that would allow me to run a blast search for fasta sequences against the NCBI database and came across blast() function from your taxreturn package. Eventually I want to assign taxonomy to the sequences. I would prefer to do the blast search without downloading the NCBI nr database locally. From the usage description of the blast() function: "If db is set to "remote", this will conduct a search against NCBI nucleotide database." Does this mean that I don't need to have a local reference database for the blast search? I tried the blast() function and got an error as below:

blast(query = , db = "remote") Error: Executable for blastn not found! Please make sure that the software is correctly installed and, if necessary, path variables are set.

Could you please help me with this error?

I also tried the command below but get the same error when I use blast() function:

blast_install(url = "https://ftp.ncbi.nlm.nih.gov/blast/executables/blast+/2.13.0/ncbi-blast-2.13.0+-x64-win64.tar.gz") Downloaded ncbi-blast-2.13.0+ in 17 secs

I also tried the following, but get the same error.

blast_assign_species(query = , db = "remote", type = "blastn", identity = 97, coverage = 95, evalue = 1e+06, max_target_seqs = 5, max_hsp = 5, ranks = c("Kingdom", "Phylum", "Class", "Order", "Family", "Genus", "Species"), delim = ";") Error: Executable for blastn not found! Please make sure that the software is correctly installed and, if necessary, path variables are set.

I am using this in Rstudio on Windows 10. Am I downloading a wrong executable? Do I need to unzip or add something to path, or anything else?

Thanks, Gaurav

alexpiper commented 2 years ago

Hi Guarev, sorry for the late response.

The blast()funtion in this package is basically just a wrapper around the command line blast+ tool. The db="remote" option will run web queries against the NCBI servers, similar to adding the –remote flag when running blast+ from the command line. Note that this really only appropriate for small numbers of sequences and can be slow and time out when searching hundreds or thousands of sequences.

Looking at the error there, it might be best to install the blast+ executable manually, as adding to path automatically from R seems to be failing.

Cheers, Alex

gbchauhan commented 2 years ago

Hi Alex,

Thank you for your response. I have installed the blast+ manually. Could you please help me with how do I add it to R after the installation? I am sorry I am new to computation.

Thanks, Gaurav

alexpiper commented 2 years ago

Sorry again for the very slow response, i have been on holidays.

You will need to add the blast+ directory to your PATH. How to do this depends on the operating system. The instructions for windows are under the Environment Variables section on this page. For Mac OSX or Linux you should be able to find the answer on google.

Then if you run the following code in R: library(taxreturn) .findExecutable("blastn")

If the PATH variable has been set correctly R will return something like this, indicating that blastn is accessible from R:

"C:\\PROGRA~1\\NCBI\\BLAST-~1.0_\\bin\\blastn.exe"

If it instead returns a blank "" the PATH has not been set correctly