EddyRivasLab / hmmer

HMMER: biological sequence analysis using profile HMMs
http://hmmer.org
Other
306 stars 69 forks source link

Error: ���� when running in docker #239

Open multimeric opened 3 years ago

multimeric commented 3 years ago

I am running hmmbuild on a Stockholm alignment. My input alignment file has an error, and when I run hmmbuild test.hmm alignment.so, I get:

Error: Oops. Wait. I need name annotation on each alignment in a multi MSA file; failed on #3

This is a helpful error message. However, when I run the exact same command in docker, as part of a nextflow pipeline, I get the error in the title, which is not helpful. I can replicate this afterwards by:

This results in the following output

# hmmbuild :: profile HMM construction from multiple sequence alignments
# HMMER 3.3.2 (Nov 2020); http://hmmer.org/
# Copyright (C) 2020 Howard Hughes Medical Institute.
# Freely distributed under the BSD open source license.
# - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
# input alignment file:             combined.so
# output HMM file:                  c.hmm
# - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -

# idx name                  nseq  alen  mlen eff_nseq re/pos description
#---- -------------------- ----- ----- ----- -------- ------ -----------

Error: ����

Initially I thought this might be related to environment variables being set differently in Docker, but I tried to copy the exact some variables from the container using env and I still wasn't able to replicate this error on my local machine.

npcarter commented 3 years ago

Hello. Sean has asked me to take a look at this issue. Would it be possible for you to send me the combined.so file that is causing the problem? It'll be a lot easier to debug with that.

-Nick

On Mon, Apr 26, 2021 at 3:19 AM Michael Milton @.***> wrote:

I am running hmmbuild on a Stockholm alignment. My input alignment file has an error, and when I run hmmbuild test.hmm alignment.so, I get:

Error: Oops. Wait. I need name annotation on each alignment in a multi MSA file; failed on #3

This is a helpful error message. However, when I run the exact same command in docker, as part of a nextflow pipeline, I get the error in the title, which is not helpful. I can replicate this afterwards by:

  • Having the problematic Stockholm alignment in the working directly
  • Starting a docker container using docker run -i -v "$PWD":"$PWD" -w "$PWD" --entrypoint /bin/bash -t quay.io/biocontainers/hmmer:3.3.2--h1b792b2_1
  • Running hmmbuild c.hmm combined.so

This results in the following output

hmmbuild :: profile HMM construction from multiple sequence alignments

HMMER 3.3.2 (Nov 2020); http://hmmer.org/

Copyright (C) 2020 Howard Hughes Medical Institute.

Freely distributed under the BSD open source license.

- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -

input alignment file: combined.so

output HMM file: c.hmm

- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -

idx name nseq alen mlen eff_nseq re/pos description

---- -------------------- ----- ----- ----- -------- ------ -----------

Error: ����

Initially I thought this might be related to environment variables being set differently in Docker, but I tried to copy the exact some variables from the container using env and I still wasn't able to replicate this error on my local machine.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/EddyRivasLab/hmmer/issues/239, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABDJBZGDR45NGMQ5HURPP2TTKUHXBANCNFSM43SISXAQ .

multimeric commented 3 years ago

It happens with any file that has multiple MSAs without names. For example:

# STOCKHOLM 1.0

sample_1    ACDFG
sample_2    ACDFG
//

# STOCKHOLM 1.0

sample_3    ACDFG
sample_4    ACDFG
//

Notably it seems like you do actually have to be running bash inside the container for this to happen (as explained above). If you simply docker run -i -v "$PWD":"$PWD" -w "$PWD" --entrypoint hmmbuild -t quay.io/biocontainers/hmmer:3.3.2--h1b792b2_1 f.hmm test.so then the correct error message is displayed.

npcarter commented 3 years ago

Thanks for that information. I have replicated the problem in a docker image of my own, and will start looking into the issue.

-Nick

On Mon, Apr 26, 2021 at 3:19 AM Michael Milton @.***> wrote:

I am running hmmbuild on a Stockholm alignment. My input alignment file has an error, and when I run hmmbuild test.hmm alignment.so, I get:

Error: Oops. Wait. I need name annotation on each alignment in a multi MSA file; failed on #3

This is a helpful error message. However, when I run the exact same command in docker, as part of a nextflow pipeline, I get the error in the title, which is not helpful. I can replicate this afterwards by:

  • Having the problematic Stockholm alignment in the working directly
  • Starting a docker container using docker run -i -v "$PWD":"$PWD" -w "$PWD" --entrypoint /bin/bash -t quay.io/biocontainers/hmmer:3.3.2--h1b792b2_1
  • Running hmmbuild c.hmm combined.so

This results in the following output

hmmbuild :: profile HMM construction from multiple sequence alignments

HMMER 3.3.2 (Nov 2020); http://hmmer.org/

Copyright (C) 2020 Howard Hughes Medical Institute.

Freely distributed under the BSD open source license.

- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -

input alignment file: combined.so

output HMM file: c.hmm

- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -

idx name nseq alen mlen eff_nseq re/pos description

---- -------------------- ----- ----- ----- -------- ------ -----------

Error: ����

Initially I thought this might be related to environment variables being set differently in Docker, but I tried to copy the exact some variables from the container using env and I still wasn't able to replicate this error on my local machine.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/EddyRivasLab/hmmer/issues/239, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABDJBZGDR45NGMQ5HURPP2TTKUHXBANCNFSM43SISXAQ .

npcarter commented 3 years ago

Hello,

I've tracked down the cause of this, and have a workaround for you. The problem occurs because the programs HMMER provides are usually run from the command-line, in which case we want to print error messages to stderr, but are sometimes run as a daemon, in which case we want to log any errors to an error file. In our error code, we call getppid() to detect this, counting on the fact that getppid() will return 1 if a HMMER program is run from the command line normally and something else if it is run as a daemon.

Under Docker and a bash shell, however, things apparently don't work that way by default, and getppid() returns a non-1 process ID, which causes our code to try to log the error to an error file instead of displaying it.

The workaround I've found is to run docker with the --pid=host flag. This causes processes run under Docker to report the same PIDs as if they were run directly on the host, which makes our error-reporting code work correctly.

I hope this helps. Looking forward, we'll probably try to figure out a better fix for this problem, as there's a lot of interest in running HMMER from within Docker, but it looks like that would require a new way of telling whether we're running as a daemon or not.

-Nicx

On Thu, Apr 29, 2021 at 9:54 AM Michael Milton @.***> wrote:

It happens with any file that has multiple MSAs without names. For example:

STOCKHOLM 1.0

sample_1 ACDFG sample_2 ACDFG //

STOCKHOLM 1.0

sample_3 ACDFG sample_4 ACDFG //

Notably it seems like you do actually have to be running bash inside the container for this to happen (as explained above). If you simply docker run -i -v "$PWD":"$PWD" -w "$PWD" --entrypoint hmmbuild -t quay.io/biocontainers/hmmer:3.3.2--h1b792b2_1 f.hmm test.so then the correct error message is displayed.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/EddyRivasLab/hmmer/issues/239#issuecomment-829257375, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABDJBZFBJDR3FOMN7PEPP5TTLFQI3ANCNFSM43SISXAQ .

multimeric commented 3 years ago

Great! Thanks for the workaround, I'll add that flag to my docker containers in the short term to help the error messages! As you say, this only happens when running inside bash, in a container. This seems like an unusual use-case, but is actually how many pipeline managers (such as nextflow) run tools, so it's a relevant use case to consider.

Yes it seems that it's tricky to work out if the process is a daemon using the OS utilities, but HMMer must internally know if it's running as a daemon in order to decide when to terminate?

npcarter commented 3 years ago

Yes it seems that it's tricky to work out if the process is a daemon using the OS utilities, but HMMer must internally know if it's running as a daemon in order to decide when to terminate? The challenge comes from the fact that the vast majority of HMMER's code is shared between the applications that run in daemon and command-line mode, so while the application that runs as a daemon (the server that's hosted by EBI) knows it's running as a daemon, it's much harder for the low-level routines to tell which program is calling them. So, we either need to figure out a better way to decide at the time an error occurs how it should be reported, or create a mechanism to propagate the information about whether a program was run from the command-line or as a daemon to all of the places where an error might occur. That has the potential to require a lot of code changes and testing.

-Nick

On Tue, May 4, 2021 at 10:29 PM Michael Milton @.***> wrote:

Great! Thanks for the workaround, I'll add that flag to my docker containers in the short term to help the error messages! As you say, this only happens when running inside bash, in a container. This seems like an unusual use-case, but is actually how many pipeline managers (such as nextflow) run tools, so it's a relevant use case to consider.

Yes it seems that it's tricky to work out if the process is a daemon using the OS utilities, but HMMer must internally know if it's running as a daemon in order to decide when to terminate?

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/EddyRivasLab/hmmer/issues/239#issuecomment-832372549, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABDJBZAKMPRHQXU5CUY7AV3TMCUO3ANCNFSM43SISXAQ .

multimeric commented 3 years ago

Ah, I see. Could you perhaps use an environment variable based switch, and then set the HMMER_DAEMON_MODE=1 when you fork from the daemon, but if this variable is unset, always print out to stderr?

cryptogenomicon commented 1 year ago

Nick, could you re-test this issue with our current develop branch? I think I may have just fixed a bug that could have plausibly been causing this issue.

npcarter commented 1 year ago

Just checked this on my Mac, and behavior appears unchanged. I get the mangled error output when I just run the docker image, and the correct output when I run docker with --pid=host.