Closed cllopes closed 4 years ago
@cllopes sorry that you run into this issues with the new image. We always try to test as much as possible - but that might not be every possible option. Special in Docker environments.
We have added in this Image only tini as you can see.
I just found a note in the readme of tini - but that would indicate that your Graylog is dying anyway.
Could you provide a complete log from start until the container dies to this ticket that we can look into?
thank you
I think, I am running into the same problem and its caused by the health_check script.
current container:
graylog@logging-graylog:~$ bash -x /health_check.sh
+ source /etc/profile
+++ id -u
++ '[' 1100 -eq 0 ']'
++ PATH=/usr/local/bin:/usr/bin:/bin:/usr/local/games:/usr/games
++ export PATH
++ '[' '' ']'
++ '[' -d /etc/profile.d ']'
++ for i in /etc/profile.d/*.sh
++ '[' -r /etc/profile.d/graylog.sh ']'
++ . /etc/profile.d/graylog.sh
+++ export JAVA_HOME=/usr/local/openjdk-8
+++ JAVA_HOME=/usr/local/openjdk-8
+++ export BUILD_DATE=
+++ BUILD_DATE=
+++ export GRAYLOG_VERSION=3.1.2
+++ GRAYLOG_VERSION=3.1.2
+++ export 'GRAYLOG_SERVER_JAVA_OPTS=-XX:+UnlockExperimentalVMOptions -XX:+UseCGroupMemoryLimitForHeap -XX:NewRatio=1 -XX:MaxMetaspaceSize=256m -server -XX:+ResizeTLAB -XX:+UseConcMarkSweepGC -XX:+CMSConcurrentMTEnabled -XX:+CMSClassUnloadingEnabled -XX:+UseParNewGC -XX:-OmitStackTraceInFastThrow'
+++ GRAYLOG_SERVER_JAVA_OPTS='-XX:+UnlockExperimentalVMOptions -XX:+UseCGroupMemoryLimitForHeap -XX:NewRatio=1 -XX:MaxMetaspaceSize=256m -server -XX:+ResizeTLAB -XX:+UseConcMarkSweepGC -XX:+CMSConcurrentMTEnabled -XX:+CMSClassUnloadingEnabled -XX:+UseParNewGC -XX:-OmitStackTraceInFastThrow'
+++ export GRAYLOG_HOME=/usr/share/graylog
+++ GRAYLOG_HOME=/usr/share/graylog
+++ export GRAYLOG_USER=graylog
+++ GRAYLOG_USER=graylog
+++ export GRAYLOG_GROUP=graylog
+++ GRAYLOG_GROUP=graylog
+++ export GRAYLOG_UID=1100
+++ GRAYLOG_UID=1100
+++ export GRAYLOG_GID=1100
+++ GRAYLOG_GID=1100
+++ export PATH=/usr/share/graylog/bin:/usr/local/openjdk-8/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin
+++ PATH=/usr/share/graylog/bin:/usr/local/openjdk-8/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin
++ unset i
+ proto=http
+ http_bind_address=127.0.0.1:9000
+ [[ -f /usr/share/graylog/data/config/graylog.conf ]]
++ grep '^http_publish_uri' /usr/share/graylog/data/config/graylog.conf
++ awk -F = '{print $2}'
++ awk '{$1=$1};1'
+ http_publish_uri=
++ grep '^http_bind_address' /usr/share/graylog/data/config/graylog.conf
++ awk -F = '{print $2}'
++ awk '{$1=$1};1'
+ http_bind_address=0.0.0.0:9000
++ grep '^http_enable_tls' /usr/share/graylog/data/config/graylog.conf
++ awk -F = '{print $2}'
++ awk '{$1=$1};1'
+ http_enable_tls=
+ [[ ! -z '' ]]
+ [[ ! -z '' ]]
+ [[ ! -z '' ]]
+ [[ ! -z '' ]]
+ [[ ! -z 0.0.0.0:9000 ]]
+ check_url=http://0.0.0.0:9000
+ [[ ! -z '' ]]
+ echo 'not possible to get Graylog listen URI - abort'
not possible to get Graylog listen URI - abort
+ exit 1
working container:
graylog@logging-graylog:~$ bash -x /health_check.sh
+ source /etc/profile
+++ id -u
++ '[' 1100 -eq 0 ']'
++ PATH=/usr/local/bin:/usr/bin:/bin:/usr/local/games:/usr/games
++ export PATH
++ '[' '' ']'
++ '[' -d /etc/profile.d ']'
++ for i in /etc/profile.d/*.sh
++ '[' -r /etc/profile.d/graylog.sh ']'
++ . /etc/profile.d/graylog.sh
+++ export JAVA_HOME=/usr/local/openjdk-8
+++ JAVA_HOME=/usr/local/openjdk-8
+++ export BUILD_DATE=
+++ BUILD_DATE=
+++ export GRAYLOG_VERSION=3.1.2
+++ GRAYLOG_VERSION=3.1.2
+++ export 'GRAYLOG_SERVER_JAVA_OPTS=-XX:+UnlockExperimentalVMOptions -XX:+UseCGroupMemoryLimitForHeap -XX:NewRatio=1 -XX:MaxMetaspaceSize=256m -server -XX:+ResizeTLAB -XX:+UseConcMarkSweepGC -XX:+CMSConcurrentMTEnabled -XX:+CMSClassUnloadingEnabled -XX:+UseParNewGC -XX:-OmitStackTraceInFastThrow'
+++ GRAYLOG_SERVER_JAVA_OPTS='-XX:+UnlockExperimentalVMOptions -XX:+UseCGroupMemoryLimitForHeap -XX:NewRatio=1 -XX:MaxMetaspaceSize=256m -server -XX:+ResizeTLAB -XX:+UseConcMarkSweepGC -XX:+CMSConcurrentMTEnabled -XX:+CMSClassUnloadingEnabled -XX:+UseParNewGC -XX:-OmitStackTraceInFastThrow'
+++ export GRAYLOG_HOME=/usr/share/graylog
+++ GRAYLOG_HOME=/usr/share/graylog
+++ export GRAYLOG_USER=graylog
+++ GRAYLOG_USER=graylog
+++ export GRAYLOG_GROUP=graylog
+++ GRAYLOG_GROUP=graylog
+++ export GRAYLOG_UID=1100
+++ GRAYLOG_UID=1100
+++ export GRAYLOG_GID=1100
+++ GRAYLOG_GID=1100
+++ export PATH=/usr/share/graylog/bin:/usr/local/openjdk-8/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin
+++ PATH=/usr/share/graylog/bin:/usr/local/openjdk-8/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin
++ unset i
+ proto=http
+ http_bind_address=127.0.0.1:9000
+ [[ -f /usr/share/graylog/data/config/graylog.conf ]]
++ awk -F = '{print $2}'
++ awk '{$1=$1};1'
++ grep '^http_publish_uri' /usr/share/graylog/data/config/graylog.conf
+ http_publish_uri=
++ grep '^http_bind_address' /usr/share/graylog/data/config/graylog.conf
++ awk '{$1=$1};1'
++ awk -F = '{print $2}'
+ http_bind_address=0.0.0.0:9000
++ grep '^http_enable_tls' /usr/share/graylog/data/config/graylog.conf
++ awk '{$1=$1};1'
++ awk -F = '{print $2}'
+ http_enable_tls=
+ [[ ! -z '' ]]
+ [[ ! -z '' ]]
+ [[ ! -z '' ]]
+ [[ ! -z '' ]]
+ [[ ! -z 0.0.0.0:9000 ]]
+ check_url=http://0.0.0.0:9000
+ curl --silent --fail http://0.0.0.0:9000/api
+ exit 0
Thanks @jalogisch for your quick response and help on this issue!
Attached should be the complete logs for a task that starts up then exists with 143.
I also ran @knopwob's health command on the container between when the container logs say graylog is running and when the tasks fail and see the same error:
+ echo 'not possible to get Graylog listen URI - abort'
not possible to get Graylog listen URI - abort
+ exit 1
ok, identified the problem to be in this part
We will need to change the logic - sorry that we did not test that enough to see the problem.
I have experienced the following bug:
From config/graylog.conf
:
#### HTTP publish URI
# Default: http://$http_bind_address/
#http_publish_uri = http://192.168.1.1:9000/
From health_check.sh
:
if [[ ! -z "${http_publish_uri}" ]]
then
check_url="${proto}"://"${http_publish_uri}"
else
echo "not possible to get Graylog listen URI - abort"
exit 1
fi
Result:
echo $check_url
http://http://192.168.1.1:9000/
curl "${check_url}"/api
curl: (6) Could not resolve host: http
I used the image graylog/graylog:3.1
, which suddenly stopped working for no apparent reason. HealthCheck started reporting problems. In configuration I had only http_bind_address
and http_external_uri
set. I began analyzing the health_check.sh
file and I found the above problem. I had to back to version graylog/graylog:3.1.2-1
for the application to work again.
@gander did you used the image that was created with the tag of https://github.com/Graylog2/graylog-docker/releases/tag/3.1.2-3 or did you used the image https://github.com/Graylog2/graylog-docker/releases/tag/3.1.2-2 ?
The first one should have the fix.
I see the same issue as gander above (https://github.com/Graylog2/graylog-docker/issues/98#issuecomment-545310814) in both https://github.com/Graylog2/graylog-docker/releases/tag/3.1.2-2 and https://github.com/Graylog2/graylog-docker/releases/tag/3.1.2-3 :
root@graylog-master:/usr/share/graylog# grep "^http_publish_uri" "${GRAYLOG_HOME}"/data/config/graylog.conf
# Default: $http_publish_uri
http_publish_uri = http://graylog-master:9000
Instrumented run of /healtcheck.sh
:
root@graylog-master:/usr/share/graylog# /health_check.sh
+ PS4='+(${BASH_SOURCE}:${LINENO}): ${FUNCNAME[0]:+${FUNCNAME[0]}(): }'
+(/health_check.sh:14): proto=http
+(/health_check.sh:15): http_bind_address=127.0.0.1:9000
+(/health_check.sh:18): [[ -f /usr/share/graylog/data/config/graylog.conf ]]
++(/health_check.sh:21): grep '^http_publish_uri' /usr/share/graylog/data/config/graylog.conf
++(/health_check.sh:21): awk -F = '{print $2}'
++(/health_check.sh:21): awk '{$1=$1};1'
+(/health_check.sh:21): http_publish_uri=http://graylog-master:9000
++(/health_check.sh:22): grep '^http_bind_address' /usr/share/graylog/data/config/graylog.conf
++(/health_check.sh:22): awk -F = '{print $2}'
++(/health_check.sh:22): awk '{$1=$1};1'
+(/health_check.sh:22): http_bind_address=0.0.0.0:9000
++(/health_check.sh:23): grep '^http_enable_tls' /usr/share/graylog/data/config/graylog.conf
++(/health_check.sh:23): awk -F = '{print $2}'
++(/health_check.sh:23): awk '{$1=$1};1'
+(/health_check.sh:23): http_enable_tls=
+(/health_check.sh:29): [[ ! -z '' ]]
+(/health_check.sh:40): [[ ! -z '' ]]
+(/health_check.sh:44): [[ ! -z '' ]]
+(/health_check.sh:50): [[ ! -z '' ]]
+(/health_check.sh:55): [[ ! -z 0.0.0.0:9000 ]]
+(/health_check.sh:57): check_url=http://0.0.0.0:9000
+(/health_check.sh:65): [[ ! -z http://graylog-master:9000 ]]
+(/health_check.sh:67): check_url=http://http://graylog-master:9000
+(/health_check.sh:70): [[ -z http://http://graylog-master:9000 ]]
+(/health_check.sh:77): curl --silent --fail http://http://graylog-master:9000/api
+(/health_check.sh:81): exit 1
From a quick look it seems https://github.com/Graylog2/graylog-docker/blob/3.1/health_check.sh#L29-L39 tries to remove the protocol part if GRAYLOG_HTTP_PUBLISH_URI
is given (which it is not in my case). Later, http_publish_uri
is assumed to not have the protocol part. But if it came from the config file, it will be there and the resulting check_url
will have the double protocol problem.
Maybe you should reopen the issue, @jalogisch ?
thank you for honest feedback @padelt - I would please you to open a new issue for your found bug in the health-check script.
Cause this is given since it was rewritten and was not introduced by the latest changes. It is not wrong to report this here - but that is not connected to this issue.
Prior to the latest update to the
graylog/graylog:3.1
image (sha256-1e38a891067041461201e910cf2d2e85a89416fdeb938475bc5d6fc12f1385db) we were able to deploy graylog along with mongo (mongo:3
) and elastisearch (docker.elastic.co/elasticsearch/elasticsearch-oss:6.8.2
) as part of a docker swarm stack without issue.After the image push on 10/21/2019 we are encountering an issue where the service never converges.
The graylog task appears to start correctly:
But after a couple minutes it fails with a 143, a new tasks starts but eventually fails again. The service continues in this loop.
We were able to reproduce this issue as well as a working scenario with the previous image digest using the following docker-compose files.
Both were started using:
docker stack deploy -c docker-compose.yml graylog_test
Not Working (using current image)
Working (using previous digest)
docker-compose is able to successfully start the graylog server with both files so the issue seems specific to docker swarm stacks.
Any ideas?
Happy to provide more debug details.