EddyRivasLab / easel

Sequence analysis library used by Eddy/Rivas lab code
Other
46 stars 26 forks source link

esl-sfetch error #26

Closed Jigyasa3 closed 5 years ago

Jigyasa3 commented 6 years ago

Hello

I using esl-sfetch to extract domain sequences from domtblout file from hmmer search.

I found this code in a blog post-

hmmsearch --domtblout myhits.dtbl tutorial/fn3.hmm ~/data/uniprot_sprot.fasta grep -v "^#" myhits.dtbl | awk '{print $1"/"$20"-"$21, $20, $21, $1}' | esl-sfetch -Cf uniprot_sprot.fasta - > myhits.fa

I have a domtblout file. But my database file is in .hmm.txt format and my query file in .fasta format. To obtain domtblout file, I used hmmscan instead of hmmsearch.

so if I run esl-sfetch --index query.fasta And, if I run esl-sfetch -index database.hmm Both gives an error.

Is it possible to extract out fasta sequences from hmmscan input/output file formats ?

cryptogenomicon commented 6 years ago

esl-sfetch should work fine on a FASTA sequence file. What error do you see with esl-sfetch --index query.fasta?

WillCao68 commented 5 years ago

hello. @cryptogenomicon I met a similar issue. I run something like esl-sfetch --index query.fa it returns:

fails to write keys to ssi file query.fa.ssi
primary keys not unique: "blabla" occurs more than once

then I checked the folder, there is a "query.fa.ssi.1" and "query.fa.ssi.2" but when I try to sfetch something, it tells me: failed to open SSI index.

there is not much documentation of what to do now. My query.fa is my own file and it's really big. All i want to do is get full-length sequence after phmmer search on the command line.

cryptogenomicon commented 5 years ago

It's telling you that creating the index file failed, because the names in your sequence file aren't unique. You have two or more sequences named blabla. In order to fetch sequences by name reliably, it's important that each sequence has a unique name, else when you tried to fetch blabla, you might not get the one you expected.

Subsequent esl-sfetch calls are then failing because you never successfully created the .ssi index file; it gave you an error message when you used --index. .ssi.1 and .ssi.2 are not the index file, they are temporary files left over from the failed --index attempt. (When the sequence file is large, esl-sfetch --index uses tmpfiles for on-disk sorting, rather than trying to sort the index in memory.)