DRL / blobtools

Modular command-line solution for visualisation, quality control and taxonomic partitioning of genome datasets
GNU General Public License v3.0
187 stars 44 forks source link

How to run blast+ #44

Closed mw55309 closed 7 years ago

mw55309 commented 7 years ago

Hi Dom

Sorry, I know this isn't a problem with blobtools per se....

My blast+ output looks like this:

tig00000001 N/A 54.3 tig00000001 N/A 53.9 tig00000001 N/A 53.1 tig00000001 N/A 53.5 tig00000001 N/A 53.9 tig00000001 N/A 47.8 tig00000001 N/A 27.7 tig00000001 N/A 53.1 tig00000001 N/A 52.4 tig00000001 N/A 53.1 tig00000001 N/A 52.4 tig00000001 N/A 53.1 tig00000001 N/A 51.6 tig00000001 N/A 53.1 tig00000001 N/A 52.8 tig00000001 N/A 52.8 tig00000001 N/A 53.5 tig00000001 N/A 53.1 tig00000001 N/A 52.8 tig00000001 N/A 52.8 tig00000001 N/A 52.4 tig00000001 N/A 52.4

Obviously N/A us not very useful.

Command ran was:

blastx -db nr_blastplus -query test.fa -outfmt '6 qseqid staxids bitscore' -num_threads 8

Building the database was:

zcat nr.gz | makeblastdb -in - -parse_seqids -dbtype prot -title nr_blastplus -out nr_blastplus

Have you seen this type of behaviour before?

Cheers Mick

DRL commented 7 years ago

Hi Mick, the reason for the N/A's is that either it can't find the TAXIDs of the sequences (because $BLASTDB is not pointing to NCBI's TaxDB) or there are no TAXIDs assigned to the sequences in the DB.

Is this NCBI's nr or a custom subset?

I just had to do something similar where I wanted to construct a new BlastDB with ONLY subjects hit by a previous search against NCBI nt:

# extract subject_ids from blast result (column 5 is sseqid in my case)
cut -f5 megablast.out | sort | uniq > subject_ids.txt
# Get tax_id_map 
blastdbcmd -db nt -entry_batch subject_ids.txt -outfmt '%i %T' -out subject_sequences.tax_id_map
# Get sequences
blastdbcmd -db nt -entry_batch subject_ids.txt -out subject_sequences.fna
# make DB
makeblastdb -in subject_sequences.fna -dbtype nucl -taxid_map subject_sequences.tax_id_map
 -parse_seqids

Hope that helps.

cheers,

dom