CDLUC3 / ezid

CDLUC3 ezid
MIT License
11 stars 4 forks source link

Investigate Database error 2/26 #577

Closed jsjiang closed 2 months ago

jsjiang commented 4 months ago

Low memory capacity caused RDS reboot. Some background jobs lost DB connections and triggered Nagios alerts. Background jobs were self recovered after RDS reboot. Find out the root cause

jsjiang commented 4 months ago

RDS error message (From Martin):

"The database is using the doublewrite buffer. RDS Optimized Writes is incompatible with the storage configuration."

jsjiang commented 4 months ago

EZID background job log:


Feb 26 08:56:30 uc3-ezidui02x2-prd ezid-proc-errorlog: INFO ezidapp.management.commands.proc-cleanup-async-queues 140627713910592: ----------
Feb 26 08:56:30 uc3-ezidui02x2-prd ezid-proc-errorlog: INFO ezidapp.management.commands.proc-cleanup-async-queues 140627713910592: Running job for identifier: ark:/87278/s6ctgvf3 in binder queue
Feb 26 08:56:30 uc3-ezidui02x2-prd ezid-proc-errorlog: INFO ezidapp.management.commands.proc-cleanup-async-queues 140627713910592: ----------
Feb 26 08:56:30 uc3-ezidui02x2-prd ezid-proc-errorlog: INFO ezidapp.management.commands.proc-cleanup-async-queues 140627713910592: Running job for identifier: ark:/87278/s6gxsm77 in binder queue
Feb 26 08:56:30 uc3-ezidui02x2-prd ezid-proc-errorlog: INFO ezidapp.management.commands.proc-cleanup-async-queues 140627713910592: ----------
Feb 26 08:56:30 uc3-ezidui02x2-prd ezid-proc-errorlog: INFO ezidapp.management.commands.proc-cleanup-async-queues 140627713910592: Running job for identifier: ark:/87278/s6rs6g6x in binder queue
Feb 26 08:56:30 uc3-ezidui02x2-prd ezid-proc-errorlog: INFO ezidapp.management.commands.proc-cleanup-async-queues 140627713910592: ----------
Feb 26 08:56:30 uc3-ezidui02x2-prd ezid-proc-errorlog: INFO ezidapp.management.commands.proc-cleanup-async-queues 140627713910592: Running job for identifier: ark:/87278/s6kbnget in binder queue
Feb 26 08:56:30 uc3-ezidui02x2-prd ezid-proc-errorlog: INFO ezidapp.management.commands.proc-cleanup-async-queues 140627713910592: ----------
Feb 26 08:56:30 uc3-ezidui02x2-prd ezid-proc-errorlog: INFO ezidapp.management.commands.proc-cleanup-async-queues 140627713910592: Running job for identifier: ark:/87278/s6hx3wy5 in binder queue
Feb 26 08:56:30 uc3-ezidui02x2-prd ezid-proc-errorlog: INFO ezidapp.management.commands.proc-cleanup-async-queues 140627713910592: ----------
Feb 26 08:56:30 uc3-ezidui02x2-prd ezid-proc-errorlog: INFO ezidapp.management.commands.proc-cleanup-async-queues 140627713910592: Running job for identifier: ark:/87278/s6ds7myh in binder queue
Feb 26 08:56:35 uc3-ezidui02x2-prd ezid-proc-errorlog: ERROR ezidapp.management.commands.proc-binder 140502339688256: ####################################################################################################
Feb 26 08:56:35 uc3-ezidui02x2-prd ezid-proc-errorlog: Traceback (most recent call last):
Feb 26 08:56:35 uc3-ezidui02x2-prd ezid-proc-errorlog: File "/ezid/.pyenv/versions/ezid-py3/lib/python3.8/site-packages/Django-3.2.7-py3.8.egg/django/db/backends/utils.py", line 84, in _execute
Feb 26 08:56:35 uc3-ezidui02x2-prd ezid-proc-errorlog: return self.cursor.execute(sql, params)
Feb 26 08:56:35 uc3-ezidui02x2-prd ezid-proc-errorlog: File "/ezid/.pyenv/versions/ezid-py3/lib/python3.8/site-packages/Django-3.2.7-py3.8.egg/django/db/backends/mysql/base.py", line 73, in execute
Feb 26 08:56:35 uc3-ezidui02x2-prd ezid-proc-errorlog: return self.cursor.execute(query, args)
Feb 26 08:56:35 uc3-ezidui02x2-prd ezid-proc-errorlog: File "/ezid/.pyenv/versions/ezid-py3/lib/python3.8/site-packages/mysqlclient-2.1.0-py3.8-linux-x86_64.egg/MySQLdb/cursors.py", line 206, in execute
Feb 26 08:56:35 uc3-ezidui02x2-prd ezid-proc-errorlog: res = self._query(query)
Feb 26 08:56:35 uc3-ezidui02x2-prd ezid-proc-errorlog: File "/ezid/.pyenv/versions/ezid-py3/lib/python3.8/site-packages/mysqlclient-2.1.0-py3.8-linux-x86_64.egg/MySQLdb/cursors.py", line 319, in _query
Feb 26 08:56:35 uc3-ezidui02x2-prd ezid-proc-errorlog: db.query(q)
Feb 26 08:56:35 uc3-ezidui02x2-prd ezid-proc-errorlog: File "/ezid/.pyenv/versions/ezid-py3/lib/python3.8/site-packages/mysqlclient-2.1.0-py3.8-linux-x86_64.egg/MySQLdb/connections.py", line 254, in query
Feb 26 08:56:35 uc3-ezidui02x2-prd ezid-proc-errorlog: _mysql.connection.query(self, query)
Feb 26 08:56:35 uc3-ezidui02x2-prd ezid-proc-errorlog: MySQLdb._exceptions.OperationalError: (2013, 'Lost connection to MySQL server during query')
Feb 26 08:56:35 uc3-ezidui02x2-prd ezid-proc-errorlog: The above exception was the direct cause of the following exception:
Feb 26 08:56:35 uc3-ezidui02x2-prd ezid-proc-errorlog: Traceback (most recent call last):
Feb 26 08:56:35 uc3-ezidui02x2-prd ezid-proc-errorlog: File "/ezid/ezid/manage.py", line 21, in <module>
Feb 26 08:56:35 uc3-ezidui02x2-prd ezid-proc-errorlog: main()
Feb 26 08:56:35 uc3-ezidui02x2-prd ezid-proc-errorlog: File "/ezid/ezid/manage.py", line 17, in main
Feb 26 08:56:35 uc3-ezidui02x2-prd ezid-proc-errorlog: django.core.management.execute_from_command_line(sys.argv)
Feb 26 08:56:35 uc3-ezidui02x2-prd ezid-proc-errorlog: File "/ezid/.pyenv/versions/ezid-py3/lib/python3.8/site-packages/Django-3.2.7-py3.8.egg/django/core/management/__init__.py", line 419, in execute_from_command_line
Feb 26 08:56:35 uc3-ezidui02x2-prd ezid-proc-errorlog: utility.execute()
Feb 26 08:56:35 uc3-ezidui02x2-prd ezid-proc-errorlog: File "/ezid/.pyenv/versions/ezid-py3/lib/python3.8/site-packages/Django-3.2.7-py3.8.egg/django/core/management/__init__.py", line 413, in execute
Feb 26 08:56:35 uc3-ezidui02x2-prd ezid-proc-errorlog: self.fetch_command(subcommand).run_from_argv(self.argv)
Feb 26 08:56:35 uc3-ezidui02x2-prd ezid-proc-errorlog: File "/ezid/.pyenv/versions/ezid-py3/lib/python3.8/site-packages/Django-3.2.7-py3.8.egg/django/core/management/base.py", line 354, in run_from_argv
Feb 26 08:56:35 uc3-ezidui02x2-prd ezid-proc-errorlog: self.execute(*args, **cmd_options)
Feb 26 08:56:35 uc3-ezidui02x2-prd ezid-proc-errorlog: File "/ezid/.pyenv/versions/ezid-py3/lib/python3.8/site-packages/Django-3.2.7-py3.8.egg/django/core/management/base.py", line 398, in execute
Feb 26 08:56:35 uc3-ezidui02x2-prd ezid-proc-errorlog: Traceback (most recent call last):
Feb 26 08:56:35 uc3-ezidui02x2-prd ezid-proc-errorlog: File "/ezid/.pyenv/versions/ezid-py3/lib/python3.8/site-packages/Django-3.2.7-py3.8.egg/django/db/backends/utils.py", line 84, in _execute
Feb 26 08:56:35 uc3-ezidui02x2-prd ezid-proc-errorlog: return self.cursor.execute(sql, params)
Feb 26 08:56:35 uc3-ezidui02x2-prd ezid-proc-errorlog: File "/ezid/.pyenv/versions/ezid-py3/lib/python3.8/site-packages/Django-3.2.7-py3.8.egg/django/db/backends/mysql/base.py", line 73, in execute
Feb 26 08:56:35 uc3-ezidui02x2-prd ezid-proc-errorlog: return self.cursor.execute(query, args)
Feb 26 08:56:35 uc3-ezidui02x2-prd ezid-proc-errorlog: File "/ezid/.pyenv/versions/ezid-py3/lib/python3.8/site-packages/mysqlclient-2.1.0-py3.8-linux-x86_64.egg/MySQLdb/cursors.py", line 206, in execute
Feb 26 08:56:35 uc3-ezidui02x2-prd ezid-proc-errorlog: res = self._query(query)
Feb 26 08:56:35 uc3-ezidui02x2-prd ezid-proc-errorlog: File "/ezid/.pyenv/versions/ezid-py3/lib/python3.8/site-packages/mysqlclient-2.1.0-py3.8-linux-x86_64.egg/MySQLdb/cursors.py", line 319, in _query
Feb 26 08:56:35 uc3-ezidui02x2-prd ezid-proc-errorlog: db.query(q)
Feb 26 08:56:35 uc3-ezidui02x2-prd ezid-proc-errorlog: File "/ezid/.pyenv/versions/ezid-py3/lib/python3.8/site-packages/mysqlclient-2.1.0-py3.8-linux-x86_64.egg/MySQLdb/connections.py", line 254, in query
Feb 26 08:56:35 uc3-ezidui02x2-prd ezid-proc-errorlog: _mysql.connection.query(self, query)
```ckground 
jsjiang commented 4 months ago

Received 6 email notifications:

Exception raised in download.run:
OperationalError: (2006, 'MySQL server has gone away')

Traceback (most recent call last):
  File "/ezid/.pyenv/versions/ezid-py3/lib/python3.8/site-packages/Django-3.2.7-py3.8.egg/django/db/backends/utils.py", line 84, in _execute
    return self.cursor.execute(sql, params)
  File "/ezid/.pyenv/versions/ezid-py3/lib/python3.8/site-packages/Django-3.2.7-py3.8.egg/django/db/backends/mysql/base.py", line 73, in execute
    return self.cursor.execute(query, args)
  File "/ezid/.pyenv/versions/ezid-py3/lib/python3.8/site-packages/mysqlclient-2.1.0-py3.8-linux-x86_64.egg/MySQLdb/cursors.py", line 206, in execute
    res = self._query(query)
  File "/ezid/.pyenv/versions/ezid-py3/lib/python3.8/site-packages/mysqlclient-2.1.0-py3.8-linux-x86_64.egg/MySQLdb/cursors.py", line 319, in _query
    db.query(q)
  File "/ezid/.pyenv/versions/ezid-py3/lib/python3.8/site-packages/mysqlclient-2.1.0-py3.8-linux-x86_64.egg/MySQLdb/connections.py", line 254, in query
    _mysql.connection.query(self, query)
MySQLdb._exceptions.OperationalError: (2006, 'MySQL server has gone away')

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/apps/ezid/ezid/ezidapp/management/commands/proc-download.py", line 73, in run
    if len(rs) == 0:
  File "/ezid/.pyenv/versions/ezid-py3/lib/python3.8/site-packages/Django-3.2.7-py3.8.egg/django/db/models/query.py", line 262, in __len__
    self._fetch_all()
  File "/ezid/.pyenv/versions/ezid-py3/lib/python3.8/site-packages/Django-3.2.7-py3.8.egg/django/db/models/query.py", line 1324, in _fetch_all
    self._result_cache = list(self._iterable_class(self))
  File "/ezid/.pyenv/versions/ezid-py3/lib/python3.8/site-packages/Django-3.2.7-py3.8.egg/django/db/models/query.py", line 51, in __iter__
    results = compiler.execute_sql(chunked_fetch=self.chunked_fetch, chunk_size=self.chunk_size)
  File "/ezid/.pyenv/versions/ezid-py3/lib/python3.8/site-packages/Django-3.2.7-py3.8.egg/django/db/models/sql/compiler.py", line 1175, in execute_sql
    cursor.execute(sql, params)
  File "/ezid/.pyenv/versions/ezid-py3/lib/python3.8/site-packages/Django-3.2.7-py3.8.egg/django/db/backends/utils.py", line 66, in execute
    return self._execute_with_wrappers(sql, params, many=False, executor=self._execute)
  File "/ezid/.pyenv/versions/ezid-py3/lib/python3.8/site-packages/Django-3.2.7-py3.8.egg/django/db/backends/utils.py", line 75, in _execute_with_wrappers
    return executor(sql, params, many, context)
  File "/ezid/.pyenv/versions/ezid-py3/lib/python3.8/site-packages/Django-3.2.7-py3.8.egg/django/db/backends/utils.py", line 84, in _execute
    return self.cursor.execute(sql, params)
  File "/ezid/.pyenv/versions/ezid-py3/lib/python3.8/site-packages/Django-3.2.7-py3.8.egg/django/db/utils.py", line 90, in __exit__
    raise dj_exc_value.with_traceback(traceback) from exc_value
  File "/ezid/.pyenv/versions/ezid-py3/lib/python3.8/site-packages/Django-3.2.7-py3.8.egg/django/db/backends/utils.py", line 84, in _execute
    return self.cursor.execute(sql, params)
  File "/ezid/.pyenv/versions/ezid-py3/lib/python3.8/site-packages/Django-3.2.7-py3.8.egg/django/db/backends/mysql/base.py", line 73, in execute
    return self.cursor.execute(query, args)
  File "/ezid/.pyenv/versions/ezid-py3/lib/python3.8/site-packages/mysqlclient-2.1.0-py3.8-linux-x86_64.egg/MySQLdb/cursors.py", line 206, in execute
    res = self._query(query)
  File "/ezid/.pyenv/versions/ezid-py3/lib/python3.8/site-packages/mysqlclient-2.1.0-py3.8-linux-x86_64.egg/MySQLdb/cursors.py", line 319, in _query
    db.query(q)
  File "/ezid/.pyenv/versions/ezid-py3/lib/python3.8/site-packages/mysqlclient-2.1.0-py3.8-linux-x86_64.egg/MySQLdb/connections.py", line 254, in query
    _mysql.connection.query(self, query)
django.db.utils.OperationalError: (2006, 'MySQL server has gone away')
Exception raised in connections.query:
OperationalError: (2013, 'Lost connection to MySQL server during query')

Traceback (most recent call last):
  File "/ezid/.pyenv/versions/ezid-py3/lib/python3.8/site-packages/Django-3.2.7-py3.8.egg/django/db/backends/utils.py", line 84, in _execute
    return self.cursor.execute(sql, params)
  File "/ezid/.pyenv/versions/ezid-py3/lib/python3.8/site-packages/Django-3.2.7-py3.8.egg/django/db/backends/mysql/base.py", line 73, in execute
    return self.cursor.execute(query, args)
  File "/ezid/.pyenv/versions/ezid-py3/lib/python3.8/site-packages/mysqlclient-2.1.0-py3.8-linux-x86_64.egg/MySQLdb/cursors.py", line 206, in execute
    res = self._query(query)
  File "/ezid/.pyenv/versions/ezid-py3/lib/python3.8/site-packages/mysqlclient-2.1.0-py3.8-linux-x86_64.egg/MySQLdb/cursors.py", line 319, in _query
    db.query(q)
  File "/ezid/.pyenv/versions/ezid-py3/lib/python3.8/site-packages/mysqlclient-2.1.0-py3.8-linux-x86_64.egg/MySQLdb/connections.py", line 254, in query
    _mysql.connection.query(self, query)
MySQLdb._exceptions.OperationalError: (2013, 'Lost connection to MySQL server during query')

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/ezid/ezid/impl/search_util.py", line 487, in executeSearchCountOnly
    c = qs.count()
  File "/ezid/.pyenv/versions/ezid-py3/lib/python3.8/site-packages/Django-3.2.7-py3.8.egg/django/db/models/query.py", line 412, in count
    return self.query.get_count(using=self.db)
  File "/ezid/.pyenv/versions/ezid-py3/lib/python3.8/site-packages/Django-3.2.7-py3.8.egg/django/db/models/sql/query.py", line 519, in get_count
    number = obj.get_aggregation(using, ['__count'])['__count']
  File "/ezid/.pyenv/versions/ezid-py3/lib/python3.8/site-packages/Django-3.2.7-py3.8.egg/django/db/models/sql/query.py", line 504, in get_aggregation
    result = compiler.execute_sql(SINGLE)
  File "/ezid/.pyenv/versions/ezid-py3/lib/python3.8/site-packages/Django-3.2.7-py3.8.egg/django/db/models/sql/compiler.py", line 1175, in execute_sql
    cursor.execute(sql, params)
  File "/ezid/.pyenv/versions/ezid-py3/lib/python3.8/site-packages/Django-3.2.7-py3.8.egg/django/db/backends/utils.py", line 66, in execute
    return self._execute_with_wrappers(sql, params, many=False, executor=self._execute)
  File "/ezid/.pyenv/versions/ezid-py3/lib/python3.8/site-packages/Django-3.2.7-py3.8.egg/django/db/backends/utils.py", line 75, in _execute_with_wrappers
    return executor(sql, params, many, context)
  File "/ezid/.pyenv/versions/ezid-py3/lib/python3.8/site-packages/Django-3.2.7-py3.8.egg/django/db/backends/utils.py", line 84, in _execute
    return self.cursor.execute(sql, params)
  File "/ezid/.pyenv/versions/ezid-py3/lib/python3.8/site-packages/Django-3.2.7-py3.8.egg/django/db/utils.py", line 90, in __exit__
    raise dj_exc_value.with_traceback(traceback) from exc_value
  File "/ezid/.pyenv/versions/ezid-py3/lib/python3.8/site-packages/Django-3.2.7-py3.8.egg/django/db/backends/utils.py", line 84, in _execute
    return self.cursor.execute(sql, params)
  File "/ezid/.pyenv/versions/ezid-py3/lib/python3.8/site-packages/Django-3.2.7-py3.8.egg/django/db/backends/mysql/base.py", line 73, in execute
    return self.cursor.execute(query, args)
  File "/ezid/.pyenv/versions/ezid-py3/lib/python3.8/site-packages/mysqlclient-2.1.0-py3.8-linux-x86_64.egg/MySQLdb/cursors.py", line 206, in execute
    res = self._query(query)
  File "/ezid/.pyenv/versions/ezid-py3/lib/python3.8/site-packages/mysqlclient-2.1.0-py3.8-linux-x86_64.egg/MySQLdb/cursors.py", line 319, in _query
    db.query(q)
  File "/ezid/.pyenv/versions/ezid-py3/lib/python3.8/site-packages/mysqlclient-2.1.0-py3.8-linux-x86_64.egg/MySQLdb/connections.py", line 254, in query
    _mysql.connection.query(self, query)
django.db.utils.OperationalError: (2013, 'Lost connection to MySQL server during query')
Internal Server Error: /manage

OperationalError at /manage
(2013, 'Lost connection to MySQL server during query')

Request Method: GET
Request URL: http://ezid.cdlib.org/manage?ps=10&order_by=c_update_time&sort=asc&owner_selected=user_iastate_lib&c_title=t&c_creator=t&c_identifier=t&c_owner=t&c_create_time=t&c_update_time=t&c_id_status=t&keywords=%22https%3A%2F%2Fezid.cdlib.org%2Fid%2Fark%3A%2F87292%2Fw90863913%22&identifier=&title=&creator=&publisher=&pubyear_from=&pubyear_to=&object_type=&target=https%3A%2F%2Fezid.cdlib.org%2Fid%2Fark%3A%2F87292%2Fw90863913&id_type=&create_time_from=&create_time_to=&update_time_from=&update_time_to=&id_status=
Django Version: 3.2.7
Python Executable: /ezid/.pyenv/versions/ezid-py3/bin/python
Python Version: 3.8.5
Python Path: ['/ezid/ezid', '/ezid/ezid/impl', '/ezid/.pyenv/versions/3.8.5/lib/python38.zip', '/ezid/.pyenv/versions/3.8.5/lib/python3.8', '/ezid/.pyenv/versions/3.8.5/lib/python3.8/lib-dynload', '/ezid/.pyenv/versions/ezid-py3/lib/python3.8/site-packages', '/apps/ezid/ezid', '/ezid/.pyenv/versions/ezid-py3/lib/python3.8/site-packages/ecs_logging-1.1.0-py3.8.egg', '/ezid/.pyenv/versions/ezid-py3/lib/python3.8/site-packages/configparser-5.0.2-py3.8.egg', '/ezid/.pyenv/versions/ezid-py3/lib/python3.8/site-packages/PyMySQL-0.9.3-py3.8.egg', '/ezid/.pyenv/versions/ezid-py3/lib/python3.8/site-packages/toml-0.10.2-py3.8.egg', '/ezid/.pyenv/versions/ezid-py3/lib/python3.8/site-packages/simplegist-1.0.1-py3.8.egg', '/ezid/.pyenv/versions/ezid-py3/lib/python3.8/site-packages/python_dateutil-2.8.2-py3.8.egg', '/ezid/.pyenv/versions/ezid-py3/lib/python3.8/site-packages/pytest-6.2.5-py3.8.egg', '/ezid/.pyenv/versions/ezid-py3/lib/python3.8/site-packages/hjson-3.0.2-py3.8.egg', '/ezid/.pyenv/versions/ezid-py3/lib/python3.8/site-packages/freezegun-1.1.0-py3.8.egg', '/ezid/.pyenv/versions/ezid-py3/lib/python3.8/site-packages/filelock-3.0.12-py3.8.egg', '/ezid/.pyenv/versions/ezid-py3/lib/python3.8/site-packages/feedparser-6.0.8-py3.8.egg', '/ezid/.pyenv/versions/ezid-py3/lib/python3.8/site-packages/fasteners-0.16.3-py3.8.egg', '/ezid/.pyenv/versions/ezid-py3/lib/python3.8/site-packages/bsddb3-6.2.9-py3.8-linux-x86_64.egg', '/ezid/.pyenv/versions/ezid-py3/lib/python3.8/site-packages/boto3-1.18.33-py3.8.egg', '/ezid/.pyenv/versions/ezid-py3/lib/python3.8/site-packages/myloginpath-0.0.2-py3.8.egg', '/ezid/.pyenv/versions/ezid-py3/lib/python3.8/site-packages/mysql-0.0.3-py3.8.egg', '/ezid/.pyenv/versions/ezid-py3/lib/python3.8/site-packages/aiomysql-0.0.21-py3.8.egg', '/ezid/.pyenv/versions/ezid-py3/lib/python3.8/site-packages/Django-3.2.7-py3.8.egg', '/ezid/.pyenv/versions/ezid-py3/lib/python3.8/site-packages/urllib3-1.26.9-py3.8.egg', '/ezid/.pyenv/versions/ezid-py3/lib/python3.8/site-packages/idna-3.3-py3.8.egg', '/ezid/.pyenv/versions/ezid-py3/lib/python3.8/site-packages/charset_normalizer-2.0.12-py3.8.egg', '/ezid/.pyenv/versions/ezid-py3/lib/python3.8/site-packages/certifi-2021.10.8-py3.8.egg', '/ezid/.pyenv/versions/ezid-py3/lib/python3.8/site-packages/six-1.16.0-py3.8.egg', '/ezid/.pyenv/versions/ezid-py3/lib/python3.8/site-packages/py-1.11.0-py3.8.egg', '/ezid/.pyenv/versions/ezid-py3/lib/python3.8/site-packages/pluggy-1.0.0-py3.8.egg', '/ezid/.pyenv/versions/ezid-py3/lib/python3.8/site-packages/packaging-21.3-py3.8.egg', '/ezid/.pyenv/versions/ezid-py3/lib/python3.8/site-packages/iniconfig-1.1.1-py3.8.egg', '/ezid/.pyenv/versions/ezid-py3/lib/python3.8/site-packages/attrs-21.4.0-py3.8.egg', '/ezid/.pyenv/versions/ezid-py3/lib/python3.8/site-packages/sgmllib3k-1.0.0-py3.8.egg', '/ezid/.pyenv/versions/ezid-py3/lib/python3.8/site-packages/s3transfer-0.5.2-py3.8.egg', '/ezid/.pyenv/versions/ezid-py3/lib/python3.8/site-packages/jmespath-0.10.0-py3.8.egg', '/ezid/.pyenv/versions/ezid-py3/lib/python3.8/site-packages/botocore-1.21.65-py3.8.egg', '/ezid/.pyenv/versions/ezid-py3/lib/python3.8/site-packages/MarkupSafe-2.1.1-py3.8-linux-x86_64.egg', '/ezid/.pyenv/versions/ezid-py3/lib/python3.8/site-packages/cryptography-37.0.2-py3.8-linux-x86_64.egg', '/ezid/.pyenv/versions/ezid-py3/lib/python3.8/site-packages/mysqlclient-2.1.0-py3.8-linux-x86_64.egg', '/ezid/.pyenv/versions/ezid-py3/lib/python3.8/site-packages/sqlparse-0.4.2-py3.8.egg', '/ezid/.pyenv/versions/ezid-py3/lib/python3.8/site-packages/pytz-2022.1-py3.8.egg', '/ezid/.pyenv/versions/ezid-py3/lib/python3.8/site-packages/asgiref-3.5.2-py3.8.egg', '/ezid/.pyenv/versions/ezid-py3/lib/python3.8/site-packages/pyparsing-3.0.9-py3.8.egg', '/ezid/.pyenv/versions/ezid-py3/lib/python3.8/site-packages/cffi-1.15.0-py3.8-linux-x86_64.egg', '/ezid/.pyenv/versions/ezid-py3/lib/python3.8/site-packages/pycparser-2.21-py3.8.egg', '/ezid/.pyenv/versions/ezid-py3/lib/python3.8/site-packages/lxml-4.9.1-py3.8-linux-x86_64.egg', '/ezid/.pyenv/versions/ezid-py3/lib/python3.8/site-packages/setuptools-65.5.1-py3.8.egg', '/ezid/.pyenv/versions/ezid-py3/lib/python3.8/site-packages/requests-2.31.0-py3.8.egg', '/ezid/.pyenv/versions/ezid-py3/lib/python3.8/site-packages/Pygments-2.15.0-py3.8.egg', '/ezid/.pyenv/versions/ezid-py3/lib/python3.8/site-packages/xmltodict-0.13.0-py3.8.egg', '/ezid/.pyenv/versions/ezid-py3/lib/python3.8/site-packages/Jinja2-3.1.3-py3.8.egg', '/ezid/ezid']
Server time: Mon, 26 Feb 2024 08:56:35 -0800
Installed Applications:
jsjiang commented 4 months ago

Response from AWS support:

Dear Customer,
Greetings of the day. I hope you are doing well.
Thank you for connecting AWS Premium Support. I am Khushboo and it’s my pleasure to help you with your case.
From the case notes, I understand that your production RDS server “rds-uc3-ezid5-prd “ was restarted on 2024-02-26 16:57. You have checked the logs but could not find a reason. You would like to know the reason for this restart.


Please feel free to correct me if I have misunderstood your concern.

=================================================


Firstly, I would like to extend my sincere apologies for a delayed response on this case ticket. I see that this case has been unattended for quite a while, and I regret any inconvenience that this may have caused you. This case is now assigned to me , please allow me to share my investigation and findings with you.


To begin, I have checked the RDS instance from the backend and can see that instance was restarted on 26th Feb. Below were the events at that time.


2024-02-26 16:57:03 UTC DB instance restarted db_restart 

2024-02-26 16:56:55 UTC [-]The database is using the doublewrite buffer. RDS Optimized Writes is incompatible with the storage configuration. For more information, see the RDS Optimized Writes for MYSQL documentation.


Before moving to instance restart reason, I would like to mention that event is ” doublewrite buffer” is not related to restart.


Please allow me to inform you that - This message “”The database is using the doublewrite buffer is just a notification that the underlying filesystem of the current instance is not compatible for Optimized Writes, thus it reverts back to the usage of doublewrite buffer. In addition, you can only modify an existing database instance to turn on RDS Optimized Writes if the database instance was created after the feature was released. The underlying file system format and organization that RDS Optimized Writes needs is incompatible with the file system format of databases created before the feature was released.
[+] Improving write performance with Amazon RDS Optimized Writes for MySQL - Limitations for RDS Optimized Writes - https://docs.aws.amazon.com/AmazonRDS/latest/UserGuide/rds-optimized-writes.html#rds-optimized-writes-limitations 


Your parameter group “ids-ezid-mysql80” has the rds.optimized_writes set to Auto. Below are the values for the same.
AUTO – Turn on RDS Optimized Writes if the database supports it. Turn off RDS Optimized Writes if the database doesn't support it. This setting is the default.
OFF – Turn off RDS Optimized Writes even if the database supports it.

To avoid the above notification, you can turn it OFF.

Moving further to the below event:


2024-02-26 16:57:03.465 - "DB instance restarted"


I have checked underlying hardware, network and storage for this instance but there was no issue from the infrastructure. Further I have checked resource utilisation for this instance and found below:


DB connections: There were around 23 DB connections at 16:55 UTC
https://us-west-2.console.aws.amazon.com/cloudwatch/home?region=us-west-2#metricsV2?graph=~(metrics~(~(~'AWS*2fRDS~'CPUUtilization~'DBInstanceIdentifier~'rds-uc3-ezid5-prd)~(~'.~'DatabaseConnections~'.~'.)~(~'.~'FreeableMemory~'.~'.)~(~'.~'ReadIOPS~'.~'.)~(~'.~'WriteIOPS~'.~'.))~view~'timeSeries~stacked~false~start~'2024-02-26T00*3a00*3a00.000Z~end~'2024-02-26T23*3a59*3a59.000Z~region~'us-west-2~stat~'Maximum~period~60)&query=~'*7bAWS*2fRDS*2cDBInstanceIdentifier*7d*20rds-uc3-ezid5-prd
Read IOPS: Read IOPS spiked at the same time from 10 to 1.4K.
https://us-west-2.console.aws.amazon.com/cloudwatch/home?region=us-west-2#metricsV2?graph=~(metrics~(~(~'AWS*2fRDS~'CPUUtilization~'DBInstanceIdentifier~'rds-uc3-ezid5-prd)~(~'.~'DatabaseConnections~'.~'.)~(~'.~'FreeableMemory~'.~'.)~(~'.~'ReadIOPS~'.~'.)~(~'.~'WriteIOPS~'.~'.))~view~'timeSeries~stacked~false~start~'2024-02-26T00*3a00*3a00.000Z~end~'2024-02-26T23*3a59*3a59.000Z~region~'us-west-2~stat~'Maximum~period~60)&query=~'*7bAWS*2fRDS*2cDBInstanceIdentifier*7d*20rds-uc3-ezid5-prd
Write IOPS: No unusal spike
https://us-west-2.console.aws.amazon.com/cloudwatch/home?region=us-west-2#metricsV2?graph=~(metrics~(~(~'AWS*2fRDS~'CPUUtilization~'DBInstanceIdentifier~'rds-uc3-ezid5-prd)~(~'.~'DatabaseConnections~'.~'.)~(~'.~'FreeableMemory~'.~'.)~(~'.~'ReadIOPS~'.~'.)~(~'.~'WriteIOPS~'.~'.))~view~'timeSeries~stacked~false~start~'2024-02-26T00*3a00*3a00.000Z~end~'2024-02-26T23*3a59*3a59.000Z~region~'us-west-2~stat~'Maximum~period~60)&query=~'*7bAWS*2fRDS*2cDBInstanceIdentifier*7d*20rds-uc3-ezid5-prd


Freeable memory: Dropped to around 120 MB at 16:55 UTC
https://us-west-2.console.aws.amazon.com/cloudwatch/home?region=us-west-2#metricsV2?graph=~(metrics~(~(~'AWS*2fRDS~'CPUUtilization~'DBInstanceIdentifier~'rds-uc3-ezid5-prd)~(~'.~'DatabaseConnections~'.~'.)~(~'.~'FreeableMemory~'.~'.)~(~'.~'ReadIOPS~'.~'.)~(~'.~'WriteIOPS~'.~'.))~view~'timeSeries~stacked~false~start~'2024-02-26T00*3a00*3a00.000Z~end~'2024-02-26T23*3a59*3a59.000Z~region~'us-west-2~stat~'Maximum~period~60)&query=~'*7bAWS*2fRDS*2cDBInstanceIdentifier*7d*20rds-uc3-ezid5-prd

==> From the above metrics, we can see that memory on the instance was very low (around 100 MB) due to which instance was restarted.


I tried checking Enhance monitoring and Performance insights as well but I can see its not enabled on your instance
To summarise, the instance was restarted during to low free-able memory on the instance. 


I sincerely hope the above information was helpful. If I have misunderstood your any concern or have missed out something, then please do let me know. Also kindly feel free to let me know if you have any additional questions or concerns and I will be happy to help you. My usual working hours are 8 AM- 5PM IST.
Soliciting your cooperation.
We at AWS are always there for supporting our valuable customers.
Thank you, stay safe and have a great day!!!
We value your feedback. Please share your experience by rating this and other correspondences in the AWS Support Center. You can rate a correspondence by selecting the stars in the top right corner of the correspondence.
Best regards,
Khushboo G.
Amazon Web Services
C
jsjiang commented 4 months ago

Regarding read IO: Read IOPS: Read IOPS spiked at the same time from 10 to 1.4K.

Read IO at 2.5K is normal and can reach up to 5K. So the spike from 10 to 1.4K should not be an issue.

Low memory is the root cause. Check application log to get more insights.

jsjiang commented 4 months ago

Application log (2/26, 8:5xam) showed 3 types of accesses:

Following are searches related entries

ezid@uc3-ezidui02x2-prd:10:28:55:~/logs$  grep "2024-02-26 08:5" request.log | grep -i search
2024-02-26 08:50:24,381     INFO 22e0e558d4c711ee9bf606a406a5cd13 BEGIN search/count - iastate_lib ark:/99166/p9m03z34s iastate_lib ark:/99166/p9qn5zh6c owner iastate_lib keywords https://ezid.cdlib.org/id/ark:/87292/w90863913
2024-02-26 08:50:36,092     INFO 29dbd43ad4c711ee9bf606a406a5cd13 BEGIN search/results - iastate_lib ark:/99166/p9m03z34s iastate_lib ark:/99166/p9qn5zh6c -updateTime 0 10 owner iastate_lib keywords https://ezid.cdlib.org/id/ark:/87292/w90863913
2024-02-26 08:50:49,413     INFO 31cc8662d4c711eeb24706a406a5cd13 BEGIN search/count - iastate_lib ark:/99166/p9m03z34s iastate_lib ark:/99166/p9qn5zh6c owner iastate_lib keywords "https://ezid.cdlib.org/id/ark:/87292/w90863913"
2024-02-26 08:51:45,757     INFO 5361fa46d4c711eebc1206a406a5cd13 BEGIN search/count - iastate_lib ark:/99166/p9m03z34s iastate_lib ark:/99166/p9qn5zh6c owner iastate_lib keywords "https://ezid.cdlib.org/id/ark:/87292/w90863913" target https://ezid.cdlib.org/id/ark:/87292/w90863913
2024-02-26 08:52:21,681     INFO 68cb761ed4c711ee91d206a406a5cd13 BEGIN search/count - iastate_lib ark:/99166/p9m03z34s iastate_lib ark:/99166/p9qn5zh6c owner iastate_lib keywords "https://ezid.cdlib.org/id/ark:/87292/w90863913" target https://ezid.cdlib.org/id/ark:/87292/w90863913
2024-02-26 08:55:05,895     INFO caac98b8d4c711ee9bf606a406a5cd13 BEGIN search/count - iastate_lib ark:/99166/p9m03z34s iastate_lib ark:/99166/p9qn5zh6c owner iastate_lib keywords "https://ezid.cdlib.org/id/ark:/87292/w90863913" target https://ezid.cdlib.org/id/ark:/87292/w90863913
2024-02-26 08:55:45,673     INFO e262232ed4c711eeb24706a406a5cd13 BEGIN search/count - iastate_lib ark:/99166/p9m03z34s iastate_lib ark:/99166/p9qn5zh6c owner iastate_lib keywords %20https://ezid.cdlib.org/id/ark:/87292/w90863913%20
2024-02-26 08:56:23,586     INFO f8fb3594d4c711eeb24706a406a5cd13 BEGIN search/results - iastate_lib ark:/99166/p9m03z34s iastate_lib ark:/99166/p9qn5zh6c -updateTime 0 10 owner iastate_lib keywords "https://ezid.cdlib.org/id/ark:/87292/w90863913"

The iastate_lib user performed two keyword searches. One is on "https://ezid.cdlib.org/id/ark:/87292/w90863913" without the double quotes, one with the double quotes. I tried the same searches on ezid stage. Keyword search without the double quotes succeeded. However, keyword search with the double quotes caused RDS reboot. The available memory dropped to below 100MB which caused reboot.

jsjiang commented 2 months ago

Root cause: keyword search Solution: refactor serach functionalities