Closed zhangnan0107 closed 3 years ago
Apologies, the user guide is wrong then. jackhmmer doesn't accept standard input or .gz files as the database. It needs to rewind the database file back to the beginning, to be able to iterate the search. Standard input can't be rewound. And the simple way we treat .gz files is to treat them as standard input streams, by opening a pipe from an external decompression program (specifically, gzip -dc
). It'd be possible to just close and reopen the .gz file, but our code doesn't do that currently; we just treat both stdin and .gz as "nonrewindable", and programs that require rewindable files will report an error like the one you got.
In commit 105b959f8, I've just made two changes to the documentation. The jackhmmer
man page now includes:
The
.I seqdb
needs to be a 'normal' sequence file. It cannot be read from a stdin stream, because
.B jackhmmer
needs to do multiple passes over the database. It cannot be a
compressed (gzipped) file either, because we treat gzipped files
essentially as stdin streams, calling an external decompression
program.
The userguide now includes, in the .gz compressed files
section in formats.tex
:
A significant exception is \mono{jackhmmer}, which needs to do
multiple iterations over the same input sequence database. Because it
needs to rewind and read the same target database more than once,
\mono{jackhmmer} won't accept a standard input stream as the database
argument (unlike most other programs); and because we treat compressed
files as standard input streams, by opening a pipe from an external
decompression program, \mono{jackhmmer} won't accept a compressed seq
database file either. If a program can't accept stdin or a .gz file,
it'll tell you, by failing with an informative error message.
If you see another place in the documentation that should be improved on this, let me know. Thanks!
I see, thanks a lot for the information and the update on docs. If you don't mind, can I ask will the compressed format be considered in jackhmmer
in future releases?
The user guide is quite informative and easy-to-read, I found two places in jackhmmer
's Options part, which I guess might just be writing errors? But this is not a big problem.
Thanks!
Thanks, yes - those are both typos. First should say --domE
, second should say --incdomE
. I just checked in a fix.
I'll try to add support for compressed format reading in jackhmmer
but honestly, it won't be a high priority. The performance hit for uncompressing the data is large. You want to avoid running searches on compressed files if you can. The capability would really just be there for an occasional convenience, not something you'd want in routine use.
Yes, you are right, that is reasonable. I'll try with uncompressed files or use phmmer
, hmmbuild
and hmmsearch
instead. Thanks very much for the reply!
Hi all,
I got an error when running jackhmmer by using a compressed file as
<seqdb>
, which should be fine according to the user guide. However, it just didn't work and threw an error. I also tried a small uncompressed file with seqs extracted from the original .gz file and that worked. So it seems to be related to the format. Would you please help me with this problem? By the way,gunzip
is installed in my system.Command line:
jackhmmer -o result/test.output -A result/test.alignment --tblout result/test.tblout --domtblout result/test.domtblout --notextw test.fa uniref100.fasta.gz
Error:
test.fa:
head of uniref100.fasta.gz: