canonical / mysql-operator

Machine charm for MySQL following the operator framework
https://charmhub.io/mysql
Apache License 2.0
7 stars 10 forks source link

`charmed-mysql.mysqld` fails to restart after machine is rebooted #380

Closed NucciTheBoss closed 7 months ago

NucciTheBoss commented 8 months ago

Steps to reproduce

  1. juju deploy --channel=8.0/edge mysql
  2. juju ssh mysql/0
  3. sudo systemctl reboot
  4. Wait for Juju controller to reestablish contact with Juju agent on machine...
  5. See error message "Unable to query cluster primary"

Expected behavior

I expect the mysql unit to successfully come back to active status after the underlying machine is rebooted by the human operator using systemctl reboot or juju-reboot.

Actual behavior

image

Versions

Operating system: Ubuntu 23.10

Juju CLI: 3.1.7-genericlinux-amd64

Juju agent: 3.1.6

Charm revision: 206

LXD: 5.19

Log output

Juju debug log:

unit-mysql-0: 13:44:17 INFO unit.mysql/0.juju-log Running legacy hooks/start.
unit-mysql-0: 13:44:18 WARNING unit.mysql/0.juju-log Failed to get node count
Traceback (most recent call last):
  File "/var/lib/juju/agents/unit-mysql-0/charm/src/mysql_vm_helpers.py", line 647, in _run_mysqlsh_script
    return subprocess.check_output(
  File "/usr/lib/python3.10/subprocess.py", line 421, in check_output
    return run(*popenargs, stdout=PIPE, timeout=timeout, check=True,
  File "/usr/lib/python3.10/subprocess.py", line 526, in run
    raise CalledProcessError(retcode, process.args,
subprocess.CalledProcessError: Command '['charmed-mysql.mysqlsh', '--no-wizard', '--python', '-f', '/var/snap/charmed-mysql/common/tmp9vclgr_r']' returned non-zero exit status 1.

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/var/lib/juju/agents/unit-mysql-0/charm/lib/charms/mysql/v0/mysql.py", line 1508, in get_cluster_node_count
    output = self._run_mysqlsh_script("\n".join(size_commands))
  File "/var/lib/juju/agents/unit-mysql-0/charm/src/mysql_vm_helpers.py", line 651, in _run_mysqlsh_script
    raise MySQLClientError(e.stderr)
charms.mysql.v0.mysql.MySQLClientError: b"Error opening log file '/var/log/mysqlsh/mysqlsh.log' for writing: Permission denied\n"
unit-mysql-0: 13:44:18 INFO unit.mysql/0.juju-log Unit workload member-state is offline with member-role unknown
unit-mysql-0: 13:44:19 ERROR unit.mysql/0.juju-log Failed to reboot cluster
Traceback (most recent call last):
  File "/var/lib/juju/agents/unit-mysql-0/charm/src/mysql_vm_helpers.py", line 647, in _run_mysqlsh_script
    return subprocess.check_output(
  File "/usr/lib/python3.10/subprocess.py", line 421, in check_output
    return run(*popenargs, stdout=PIPE, timeout=timeout, check=True,
  File "/usr/lib/python3.10/subprocess.py", line 526, in run
    raise CalledProcessError(retcode, process.args,
subprocess.CalledProcessError: Command '['charmed-mysql.mysqlsh', '--no-wizard', '--python', '-f', '/var/snap/charmed-mysql/common/tmpa9pesy5y']' returned non-zero exit status 1.

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/var/lib/juju/agents/unit-mysql-0/charm/lib/charms/mysql/v0/mysql.py", line 2014, in reboot_from_complete_outage
    self._run_mysqlsh_script("\n".join(reboot_from_outage_command))
  File "/var/lib/juju/agents/unit-mysql-0/charm/src/mysql_vm_helpers.py", line 651, in _run_mysqlsh_script
    raise MySQLClientError(e.stderr)
charms.mysql.v0.mysql.MySQLClientError: b"Error opening log file '/var/log/mysqlsh/mysqlsh.log' for writing: Permission denied\n"
unit-mysql-0: 13:44:19 ERROR unit.mysql/0.juju-log Failed to reboot cluster from complete outage.
unit-mysql-0: 13:44:19 WARNING unit.mysql/0.juju-log Failed to get cluster primary addresses
Traceback (most recent call last):
  File "/var/lib/juju/agents/unit-mysql-0/charm/src/mysql_vm_helpers.py", line 647, in _run_mysqlsh_script
    return subprocess.check_output(
  File "/usr/lib/python3.10/subprocess.py", line 421, in check_output
    return run(*popenargs, stdout=PIPE, timeout=timeout, check=True,
  File "/usr/lib/python3.10/subprocess.py", line 526, in run
    raise CalledProcessError(retcode, process.args,
subprocess.CalledProcessError: Command '['charmed-mysql.mysqlsh', '--no-wizard', '--python', '-f', '/var/snap/charmed-mysql/common/tmp8ynaovik']' returned non-zero exit status 1.

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/var/lib/juju/agents/unit-mysql-0/charm/lib/charms/mysql/v0/mysql.py", line 1767, in get_cluster_primary_address
    output = self._run_mysqlsh_script("\n".join(get_cluster_primary_commands))
  File "/var/lib/juju/agents/unit-mysql-0/charm/src/mysql_vm_helpers.py", line 651, in _run_mysqlsh_script
    raise MySQLClientError(e.stderr)
charms.mysql.v0.mysql.MySQLClientError: b"Error opening log file '/var/log/mysqlsh/mysqlsh.log' for writing: Permission denied\n"
unit-mysql-0: 13:44:19 WARNING unit.mysql/0.juju-log Failed to get node count
Traceback (most recent call last):
  File "/var/lib/juju/agents/unit-mysql-0/charm/src/mysql_vm_helpers.py", line 647, in _run_mysqlsh_script
    return subprocess.check_output(
  File "/usr/lib/python3.10/subprocess.py", line 421, in check_output
    return run(*popenargs, stdout=PIPE, timeout=timeout, check=True,
  File "/usr/lib/python3.10/subprocess.py", line 526, in run
    raise CalledProcessError(retcode, process.args,
subprocess.CalledProcessError: Command '['charmed-mysql.mysqlsh', '--no-wizard', '--python', '-f', '/var/snap/charmed-mysql/common/tmpvouk82e0']' returned non-zero exit status 1.

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/var/lib/juju/agents/unit-mysql-0/charm/lib/charms/mysql/v0/mysql.py", line 1508, in get_cluster_node_count
    output = self._run_mysqlsh_script("\n".join(size_commands))
  File "/var/lib/juju/agents/unit-mysql-0/charm/src/mysql_vm_helpers.py", line 651, in _run_mysqlsh_script
    raise MySQLClientError(e.stderr)
charms.mysql.v0.mysql.MySQLClientError: b"Error opening log file '/var/log/mysqlsh/mysqlsh.log' for writing: Permission denied\n"
unit-mysql-0: 13:44:19 INFO unit.mysql/0.juju-log Unit workload member-state is offline with member-role unknown
unit-mysql-0: 13:44:19 ERROR unit.mysql/0.juju-log Failed to reboot cluster
Traceback (most recent call last):
  File "/var/lib/juju/agents/unit-mysql-0/charm/src/mysql_vm_helpers.py", line 647, in _run_mysqlsh_script
    return subprocess.check_output(
  File "/usr/lib/python3.10/subprocess.py", line 421, in check_output
    return run(*popenargs, stdout=PIPE, timeout=timeout, check=True,
  File "/usr/lib/python3.10/subprocess.py", line 526, in run
    raise CalledProcessError(retcode, process.args,
subprocess.CalledProcessError: Command '['charmed-mysql.mysqlsh', '--no-wizard', '--python', '-f', '/var/snap/charmed-mysql/common/tmpqgul2uwe']' returned non-zero exit status 1.

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/var/lib/juju/agents/unit-mysql-0/charm/lib/charms/mysql/v0/mysql.py", line 2014, in reboot_from_complete_outage
    self._run_mysqlsh_script("\n".join(reboot_from_outage_command))
  File "/var/lib/juju/agents/unit-mysql-0/charm/src/mysql_vm_helpers.py", line 651, in _run_mysqlsh_script
    raise MySQLClientError(e.stderr)
charms.mysql.v0.mysql.MySQLClientError: b"Error opening log file '/var/log/mysqlsh/mysqlsh.log' for writing: Permission denied\n"
unit-mysql-0: 13:44:19 ERROR unit.mysql/0.juju-log Failed to reboot cluster from complete outage.
unit-mysql-0: 13:44:19 WARNING unit.mysql/0.juju-log Failed to get cluster primary addresses
Traceback (most recent call last):
  File "/var/lib/juju/agents/unit-mysql-0/charm/src/mysql_vm_helpers.py", line 647, in _run_mysqlsh_script
    return subprocess.check_output(
  File "/usr/lib/python3.10/subprocess.py", line 421, in check_output
    return run(*popenargs, stdout=PIPE, timeout=timeout, check=True,
  File "/usr/lib/python3.10/subprocess.py", line 526, in run
    raise CalledProcessError(retcode, process.args,
subprocess.CalledProcessError: Command '['charmed-mysql.mysqlsh', '--no-wizard', '--python', '-f', '/var/snap/charmed-mysql/common/tmpape5343q']' returned non-zero exit status 1.

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/var/lib/juju/agents/unit-mysql-0/charm/lib/charms/mysql/v0/mysql.py", line 1767, in get_cluster_primary_address
    output = self._run_mysqlsh_script("\n".join(get_cluster_primary_commands))
  File "/var/lib/juju/agents/unit-mysql-0/charm/src/mysql_vm_helpers.py", line 651, in _run_mysqlsh_script
    raise MySQLClientError(e.stderr)
charms.mysql.v0.mysql.MySQLClientError: b"Error opening log file '/var/log/mysqlsh/mysqlsh.log' for writing: Permission denied\n"
unit-mysql-0: 13:44:19 INFO juju.worker.uniter.operation ran "start" hook (via hook dispatching script: dispatch)
unit-mysql-0: 13:44:19 INFO juju.worker.uniter reboot detected; triggering implicit start hook to notify charm
unit-mysql-0: 13:44:20 INFO unit.mysql/0.juju-log Running legacy hooks/start.
unit-mysql-0: 13:44:20 WARNING unit.mysql/0.juju-log Failed to get node count
Traceback (most recent call last):
  File "/var/lib/juju/agents/unit-mysql-0/charm/src/mysql_vm_helpers.py", line 647, in _run_mysqlsh_script
    return subprocess.check_output(
  File "/usr/lib/python3.10/subprocess.py", line 421, in check_output
    return run(*popenargs, stdout=PIPE, timeout=timeout, check=True,
  File "/usr/lib/python3.10/subprocess.py", line 526, in run
    raise CalledProcessError(retcode, process.args,
subprocess.CalledProcessError: Command '['charmed-mysql.mysqlsh', '--no-wizard', '--python', '-f', '/var/snap/charmed-mysql/common/tmpph7v208l']' returned non-zero exit status 1.

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/var/lib/juju/agents/unit-mysql-0/charm/lib/charms/mysql/v0/mysql.py", line 1508, in get_cluster_node_count
    output = self._run_mysqlsh_script("\n".join(size_commands))
  File "/var/lib/juju/agents/unit-mysql-0/charm/src/mysql_vm_helpers.py", line 651, in _run_mysqlsh_script
    raise MySQLClientError(e.stderr)
charms.mysql.v0.mysql.MySQLClientError: b"Error opening log file '/var/log/mysqlsh/mysqlsh.log' for writing: Permission denied\n"
unit-mysql-0: 13:44:20 INFO unit.mysql/0.juju-log Unit workload member-state is offline with member-role unknown
unit-mysql-0: 13:44:22 ERROR unit.mysql/0.juju-log Failed to reboot cluster
Traceback (most recent call last):
  File "/var/lib/juju/agents/unit-mysql-0/charm/src/mysql_vm_helpers.py", line 647, in _run_mysqlsh_script
    return subprocess.check_output(
  File "/usr/lib/python3.10/subprocess.py", line 421, in check_output
    return run(*popenargs, stdout=PIPE, timeout=timeout, check=True,
  File "/usr/lib/python3.10/subprocess.py", line 526, in run
    raise CalledProcessError(retcode, process.args,
subprocess.CalledProcessError: Command '['charmed-mysql.mysqlsh', '--no-wizard', '--python', '-f', '/var/snap/charmed-mysql/common/tmp2t88e5rz']' returned non-zero exit status 1.

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/var/lib/juju/agents/unit-mysql-0/charm/lib/charms/mysql/v0/mysql.py", line 2014, in reboot_from_complete_outage
    self._run_mysqlsh_script("\n".join(reboot_from_outage_command))
  File "/var/lib/juju/agents/unit-mysql-0/charm/src/mysql_vm_helpers.py", line 651, in _run_mysqlsh_script
    raise MySQLClientError(e.stderr)
charms.mysql.v0.mysql.MySQLClientError: b"Error opening log file '/var/log/mysqlsh/mysqlsh.log' for writing: Permission denied\n"
unit-mysql-0: 13:44:22 ERROR unit.mysql/0.juju-log Failed to reboot cluster from complete outage.
unit-mysql-0: 13:44:22 WARNING unit.mysql/0.juju-log Failed to get cluster primary addresses
Traceback (most recent call last):
  File "/var/lib/juju/agents/unit-mysql-0/charm/src/mysql_vm_helpers.py", line 647, in _run_mysqlsh_script
    return subprocess.check_output(
  File "/usr/lib/python3.10/subprocess.py", line 421, in check_output
    return run(*popenargs, stdout=PIPE, timeout=timeout, check=True,
  File "/usr/lib/python3.10/subprocess.py", line 526, in run
    raise CalledProcessError(retcode, process.args,
subprocess.CalledProcessError: Command '['charmed-mysql.mysqlsh', '--no-wizard', '--python', '-f', '/var/snap/charmed-mysql/common/tmppnzfu473']' returned non-zero exit status 1.

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/var/lib/juju/agents/unit-mysql-0/charm/lib/charms/mysql/v0/mysql.py", line 1767, in get_cluster_primary_address
    output = self._run_mysqlsh_script("\n".join(get_cluster_primary_commands))
  File "/var/lib/juju/agents/unit-mysql-0/charm/src/mysql_vm_helpers.py", line 651, in _run_mysqlsh_script
    raise MySQLClientError(e.stderr)
charms.mysql.v0.mysql.MySQLClientError: b"Error opening log file '/var/log/mysqlsh/mysqlsh.log' for writing: Permission denied\n"

Additional context

This reboot issue seems to originate from the run-mysqlsh.sh wrapper script in the charmed-mysql snap. The raised permission denied can be one of two reasons:

  1. The snap_daemon user cannot write to the /var/log/mysqlsh/mysqlsh.log file due to the incorrect permissions being set on the file.
  2. /var/log/mysqlsh/mysqlsh.log is outside the charmed-mysql snap's confinement. I see there's a bind mount specified in the snapcraft.yaml file that points to /var/log/mysqlsh but you might need to use a bind-file instead.
    1. Since you are using a wrapper script to start mysqlsh, you could just potentially set the log file location to _$SNAPCOMMON/var/log/mysqlsh/mysqlsh.log directly in the wrapper script.
github-actions[bot] commented 8 months ago

https://warthogs.atlassian.net/browse/DPE-3329

paulomach commented 7 months ago

@NucciTheBoss fix released on 8.0/edge (charm revision 208)