colinmollenhour / mariadb-galera-swarm

MariaDb Galera Cluster container based on official mariadb image which can auto-bootstrap and recover cluster state.
https://hub.docker.com/r/colinmollenhour/mariadb-galera-swarm
Apache License 2.0
219 stars 103 forks source link

After building a new image nodes keep crashing #106

Closed emerichunter closed 2 years ago

emerichunter commented 3 years ago

Hi @colinmollenhour,

I already have used your repo for production and for a client before. Today I built an image with the help of your makefile I wanted to update the content for more recent version.

I had to solve some errors but the image builds OK. Here is the Dockerfile I modified (added mariadb-backup). This file works fine (container in swarm are healthy)

FROM mariadb:10.2

# Download blocked from http://www.quicklz.com/qpress-11-linux-x64.tar
COPY bin/qpress-11-linux-x64.tar /tmp/qpress.tar

RUN set -x \
    && apt-get update \
    && apt-get install -y --no-install-recommends --no-install-suggests \
      curl \
      netcat \
      pigz \
      percona-toolkit \
      percona-xtrabackup \
      pv \
      lsb-release \
    && curl https://repo.percona.com/apt/percona-release_latest.generic_all.deb --output percona-release_latest.generic_all.deb  \
    && dpkg -i percona-release_latest.generic_all.deb \
    && curl https://downloads.percona.com/downloads/pmm/1.17.4/binary/debian/bionic/x86_64/pmm-client_1.17.4-1.bionic_amd64.deb  --output pmm-client_1.17.4-1.bionic_amd64.deb \
    && dpkg -i pmm-client_1.17.4-1.bionic_amd64.deb \
    && tar -C /usr/local/bin -xf /tmp/qpress.tar qpress \
    && chmod +x /usr/local/bin/qpress \
    && apt-get clean all && rm -rf /tmp/* /var/lib/apt/lists/*

COPY conf.d/*                /etc/mysql/conf.d/
COPY mycnf/my.cnf            /etc/mysql/my.cnf
COPY *.sh                    /usr/local/bin/
COPY bin/galera-healthcheck  /usr/local/bin/galera-healthcheck
COPY primary-component.sql   /

RUN set -ex ;\
    # Fix permissions
    chown -R mysql:mysql /etc/mysql ;\
    chmod -R go-w /etc/mysql ;\
    # Disable code that deletes progress file after SST
    sed -i 's#-p \$progress#-p \$progress-XXX#' /usr/bin/wsrep_sst_mariabackup ;\
    sed -i 's#-p \$progress#-p \$progress-XXX#' /usr/bin/wsrep_sst_xtrabackup ;

EXPOSE 3306 4444 4567 4567/udp 4568 8080 8081

HEALTHCHECK CMD /usr/local/bin/healthcheck.sh

ENV SST_METHOD=mariabackup

ENTRYPOINT 

> ["start.sh"]

The 10.3 (which uses 20.04 focal instead of 18.04 bionic) looks good when building with this file.

FROM mariadb:10.3

# Download blocked from http://www.quicklz.com/qpress-11-linux-x64.tar
COPY bin/qpress-11-linux-x64.tar /tmp/qpress.tar

RUN set -x \
    && apt-get update \
    && apt-get install -y --no-install-recommends --no-install-suggests \
      curl \
      netcat \
      pigz \
      percona-toolkit \
      mariadb-backup \
      pv \
      lsb-release \
      ca-certificates \
    && curl https://repo.percona.com/apt/percona-release_latest.generic_all.deb --output percona-release_latest.generic_all.deb  \
    && dpkg -i percona-release_latest.generic_all.deb \
    && curl https://downloads.percona.com/downloads/pmm/1.17.4/binary/debian/bionic/x86_64/pmm-client_1.17.4-1.bionic_amd64.deb --output pmm-client_1.17.4-1.bionic_amd64.deb  \
    && dpkg -i pmm-client_1.17.4-1.bionic_amd64.deb \
    && tar -C /usr/local/bin -xf /tmp/qpress.tar qpress \
    && chmod +x /usr/local/bin/qpress \
    && apt-get clean all && rm -rf /tmp/* /var/lib/apt/lists/*

COPY conf.d/*                /etc/mysql/conf.d/
COPY mycnf/my.cnf            /etc/mysql/my.cnf
COPY *.sh                    /usr/local/bin/
COPY bin/galera-healthcheck  /usr/local/bin/galera-healthcheck
COPY primary-component.sql   /

RUN set -ex ;\
    # Fix permissions
    chown -R mysql:mysql /etc/mysql ;\
    chmod -R go-w /etc/mysql ;\
    # Disable code that deletes progress file after SST
    sed -i 's#-p \$progress#-p \$progress-XXX#' /usr/bin/wsrep_sst_mariabackup ;

EXPOSE 3306 4444 4567 4567/udp 4568 8080 8081

HEALTHCHECK CMD /usr/local/bin/healthcheck.sh

ENV SST_METHOD=mariabackup

ENTRYPOINT ["start.sh"]

However then I deploy the stack and when I turn off the seed and scale to 3 nodes. Those keep crashing and being respawned.

time="2021-07-12T10:26:55.771369100+02:00" level=info msg="NetworkDB stats DESKTOP-9RB1276(cab8f4fe809e) - netID:ase91jt88x6l7xrcwltm2lnte leaving:false netPeers:1 entries:2 Queue qLen:0 netMsg/s:0"
time="2021-07-12T10:26:55.771645500+02:00" level=info msg="NetworkDB stats DESKTOP-9RB1276(cab8f4fe809e) - netID:jyro214m0o82tjg0zmp7ly46v leaving:false netPeers:1 entries:23 Queue qLen:0 netMsg/s:0"
time="2021-07-12T10:27:43.830498300+02:00" level=info msg="ignoring event" container=890fae1c73f80299b8c8de1a7d5a435ba385b98e6d01d942292013cefafa1818 module=libcontainerd namespace=moby topic=/tasks/delete type="*events.TaskDelete"
time="2021-07-12T10:27:43.832897700+02:00" level=info msg="shim disconnected" id=890fae1c73f80299b8c8de1a7d5a435ba385b98e6d01d942292013cefafa1818
time="2021-07-12T10:27:43.833261100+02:00" level=error msg="copy shim log" error="read /proc/self/fd/16: file already closed"
time="2021-07-12T10:27:43.838691600+02:00" level=warning msg="rmServiceBinding 48749d62372c8340253d22209c89863beb1eeaefcdbd4b42fd34b9c4d04a8f97 possible transient state ok:false entries:0 set:false "
time="2021-07-12T10:27:50.002461500+02:00" level=info msg="starting signal loop" namespace=moby path=/run/docker/containerd/daemon/io.containerd.runtime.v2.task/moby/f0c91cfde6c49952d75c53b2bf1cac60b35d8e05ec3c21d8890e70dea80228e7 pid=19134
time="2021-07-12T10:28:03.470156800+02:00" level=info msg="ignoring event" container=aef28517c12dcd4d9a753f27b1a3039195c703816306c54b03e3365957d43efa module=libcontainerd namespace=moby topic=/tasks/delete type="*events.TaskDelete"
time="2021-07-12T10:28:03.471073400+02:00" level=info msg="shim disconnected" id=aef28517c12dcd4d9a753f27b1a3039195c703816306c54b03e3365957d43efa
time="2021-07-12T10:28:03.471412400+02:00" level=error msg="copy shim log" error="read /proc/self/fd/12: file already closed"
time="2021-07-12T10:28:03.486098500+02:00" level=warning msg="rmServiceBinding c976f03a1e46a22396eeb60845adc9a161c8046eee85155562d7bd9c013603a1 possible transient state ok:false entries:0 set:false "
time="2021-07-12T10:28:10.035381300+02:00" level=info msg="starting signal loop" namespace=moby path=/run/docker/containerd/daemon/io.containerd.runtime.v2.task/moby/69adfa9a6ceaf532a8f747bf0f8b7c0aadd6869b06e717c528878828029c7333 pid=19488
time="2021-07-12T10:28:14.928939700+02:00" level=info msg="ignoring event" container=24cc8da34090598a7afbca1838d2d679de6028e640f75d491f870f84d5eb11ce module=libcontainerd namespace=moby topic=/tasks/delete type="*events.TaskDelete"
time="2021-07-12T10:28:14.941253600+02:00" level=info msg="shim disconnected" id=24cc8da34090598a7afbca1838d2d679de6028e640f75d491f870f84d5eb11ce
time="2021-07-12T10:28:14.941468900+02:00" level=error msg="copy shim log" error="read /proc/self/fd/19: file already closed"
time="2021-07-12T10:28:14.947387600+02:00" level=warning msg="rmServiceBinding 2726957a70a89cd28dd7c85cefa4585fccc9f067c85a778a99ec4a441328a5b9 possible transient state ok:false entries:0 set:false "
time="2021-07-12T10:28:21.119924800+02:00" level=info msg="starting signal loop" namespace=moby path=/run/docker/containerd/daemon/io.containerd.runtime.v2.task/moby/e7ec826bbbbc31ed1628332a2c2356213feca930f04f9f69c94b954779e53344 pid=19751

I collected the log file, but I don't know the reason it's crashing. If you have any idea.

The workaround is to set the version with bionic (10.2-bionic, 10.3-bionic...), but this won't last long I'm afraid.

colinmollenhour commented 2 years ago

Hey sorry for the slow reply, I didn't see this one.. I couldn't reproduce the issue.

emerichunter commented 2 years ago

That's ok thanks for the reply anyway.  Emeric 

Envoyé depuis Yahoo Mail pour Android

Le ven., juil. 1, 2022 à 1:56, Colin @.***> a écrit:

Closed #106 as completed.

— Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you authored the thread.Message ID: @.***>