Open jonwhit opened 1 year ago
Hi Jonathan - sorry for the delayed answer. Can you try to checkout the rule workflow/rules/taxonomy.smk
from the github repo please? up to now, dadasnake was checking for an un-chunked DB (.../nt.nin). Since you have a blastDB already (including .../nt.xx.nin and ..../nt.nal), the new rule should now find .../nt.nal and not attempt to make a new one. Let me know if it works - ahb
oh, and maybe a smaller COI reference database might be an alternative, see e.g. https://onlinelibrary.wiley.com/doi/10.1111/1755-0998.13756
Hi Anna and coauthors, thanks in advance for any advice. I really like the pipeline and could use some help getting it to work with using BLAST and NCBI's nt database. I am having issues getting the correct config settings for using NCBI nt database and taxdb as reference databases for COI.
What are the appropriate config parameters to use NCBI's nt database and taxonomy (taxdb) as reference for a marker like COI? Could you provide an example config.yaml file that uses Blast nt database as the reference db?
I am able to run the pipeline, but am getting errors at the blastn_cluster step. Specifically, the name of the blast database is 'nt', but because the NCBI nt database is so big there is not a single file named 'nt' but many files with nt.XXX. I am getting the error in logs/blastn_cluster.log. It appears the issues are with the makeblastdb step in blastn_cluster. The database is already made and in a local directory. I have the NCBI nt and taxdump database installed locally and following installation instructions from BASTA as linked in the dadasnake installation instructions.
Here are the errors I'm getting.
BLAST options error: File /home/jwhitney/dadasnake/DBs/blastdbs/nt does not exist.
log: logs/blastn_cluster.log (check log file(s) for error message)
conda-env: /home/jwhitney/programs/dadasnake/conda/66132e6a149ec730ec4c2d24861f8d4c
shell:
if [ -s clusteredTables/consensus.fasta ]; then
if [ ! -f "/home/jwhitney/dadasnake/DBs/blastdbs/nt.nin" ]
then
makeblastdb -dbtype nucl -in /home/jwhitney/dadasnake/DBs/blastdbs/nt -out /home/jwhitney/dadasnake/DBs/blastdbs/nt &> logs/blastn_cluster.log
fi
blastn -db /home/jwhitney/dadasnake/DBs/blastdbs/nt -query clusteredTables/consensus.fasta -outfmt "6 qseqid sseqid pident length mismatch gapopen qstart qend sstart send evalue bitscore staxids stitle" -out clusteredTables/blast_results.tsv -max_target_seqs 10 &>> logs/blastn_cluster.log
else
touch clusteredTables/blast_results.tsv
fi
(one of the commands exited with non-zero exit code; note that snakemake uses bash strict mode!)
And here are the relevant parts of the config.yaml
SETTINGS FOR TAXONOMIC ANNOTATION
taxonomy: dada: do: TRUE
classification is only done, if do_taxonomy is true
taxonomy: mothur: do: FALSE db_path: "/home/jwhitney/.basta/taxonomy" tax_db: ""
blast: do: true
blast is only done, if do_taxonomy is true
run_on:
Thanks in advance for any advice.