Extracting sequences from nhmmer `--tblout`

Euphrasiologist commented 3 months ago

Hi, thanks for this great tool!

I have a question about extracting sequences from the input fasta. I've got an example output below:

SUPER_8              -          TR                   -                1     316 32213024 32213361 32213024 32213362 40353055    +     1.6e-33  123.2   3.1  -
SCAFFOLD_76          -          TR                   -               68     131   151479   151422   151499   151402   254938    -        0.38   16.8   0.0  -

For the first line, the extraction is easy because the strand is "+". I just search the input fasta for the position 32213024 to 32213361. Great. For the second line the strand is "-". This might be a silly question, but do I need to reverse complement the input fasta before I do the search (151422 to 151479)?

Cheers, M

cryptogenomicon commented 3 months ago

You don't need to reverse complement the FASTA file yourself. But how you do it will depend on the tool that you use to fetch the subsequence(s) from the FASTA file. Different tools use different conventions for coordinates, especially for the reverse complement strand.

If you're using our esl-sfetch tool, if you give the coordinates as 151479..151422 (i.e. with start > end), it fetches the reverse complemented subsequence from the FASTA file.

Euphrasiologist commented 3 months ago

Okay thank you so much, very helpful! I didn't know about the easel tools (I see you wrote more about this here: http://cryptogenomicon.org/extracting-hmmer-results-to-sequence-files-easel-miniapplications.html).

EddyRivasLab / hmmer

Extracting sequences from nhmmer `--tblout` #325