gobics / uproc

Tools for ultra-fast protein sequence classification.
http://uproc.gobics.de/
GNU Lesser General Public License v3.0
5 stars 3 forks source link

Low complexity protein sequences give hit #18

Open mdehollander opened 8 years ago

mdehollander commented 8 years ago

Hi,

When I use uproc-prot with a fasta file containing only N's I get hits to PFAM domains. This does not seems to be biologically relevant to me. Or should I filter the hits by score?

This is the input:

>test1
NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNK
>test2
NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN

This the used command: uproc-prot --preds -o uproc_test.out /data/db/uproc/pfam28/ /data/db/uproc/model test.fasta

And this the output:

1,test1,42,PF03507,4.581
1,test1,42,PF00516,2.712
2,test2,42,PF03507,4.481
2,test2,42,PF00516,2.711