Closed toddknutson closed 6 years ago
Hi,
the SEG filter is run on the fly for the each amino acid fragment that is searched against the selected reference database (refseq, nr, progenomes), but not on the database itself by makeDB.sh.
In principle you can run it independently on the downloaded nr database, but then you need to remove the filtered sequences and not just mask them. The reason is that the BWT and FM index used in kaiju make use of only the 20 uppercase letters of the standard amino acid alphabet.
But I think it will have little effect having both the DB and the reads filtered though..
Peter
Okay, great, thanks for the explanation!
Todd
No problem :)
Hi,
Thanks for Kaiju, it's very nice!
Is the SEG filtering for low-complexity sequences performed on the
nr
database whenmakeDB.sh
is run? I understand that I can add this functionality to my input sequences when runningkaiju -x
, but I'm wondering if thenr
database has also been filtered? If not, would you suggest running SEG independently on thenr
sequences before building thekaiju
database?Thanks!