bbuchfink / diamond

Accelerated BLAST compatible local sequence aligner.
GNU General Public License v3.0
1.03k stars 183 forks source link

what parameter i should set to filter out all bacterial blastp results? #767

Open terancehhwong opened 9 months ago

terancehhwong commented 9 months ago

Hi there,

I wonder what parameter i should set (eg negative_taxids ?) when i am performing blastp against protein databases, if i would like to filter out all prokaryotic blasting results, as i am only interested in eukaryotic data. If that is the case, then am i going to include a very long list of prokaryotic taxids becoz ncbi has a lot of bacterial sequences? Thanks in advance

Best regards, Terance

bbuchfink commented 9 months ago

There's the --taxonlist and --taxon-exclude options (see wiki).

terancehhwong commented 9 months ago

Regarding this parameter, I wonder if it just does not blast the prokaryotic sequences (meaning that the transcript ID and sequence is still in the transcriptome), or if it fully removes the prokaryotic sequence from the transcriptome so it will not be found in my transcriptome anymore, as the meaning of filter out? This is because, I would like to fully eliminate all prokaryotic sequences from my transcriptome so the blastp output as well as the original transcriptome possess only eukaryotic sequences, as my experimental animal and target objectives are eukaryotes

bbuchfink commented 8 months ago

It will not search the respective database sequences, but it will not remove anything from your query file, if that's what you mean.