charles-001 / dolphie

Your single pane of glass for real-time analytics into MySQL/MariaDB & ProxySQL
https://pypi.org/project/dolphie/
GNU General Public License v3.0
589 stars 36 forks source link

is sys.innodb_lock_waits safe to run in production on a busy system ? #45

Closed aadant closed 7 months ago

aadant commented 7 months ago

we noticed that the dolphie query was taking an increasing long time. Maybe we should time it or run it on demand only ?

information_schema tables are unsafe to run on busy system unlike performance_schema ones.

# Time: 2024-02-05T15:25:35.470941Z
# User@Host:me[db] @  [9.99.99.999]  Id: 3963342006
# Schema:   Last_errno: 0  Killed: 0
# Query_time: 232.047229  Lock_time: 0.000005  Rows_sent: 15  Rows_examined: 15  Rows_affected: 0  Bytes_sent: 4207
SET timestamp=1707146503;
/* dolphie */ 
        SELECT
            wait_age,
            locked_type,
            waiting_pid,
            waiting_trx_age,
            waiting_trx_rows_modified,
            waiting_trx_rows_locked,
            waiting_lock_mode,
            IFNULL(waiting_query, "")  AS waiting_query,
            blocking_pid,
            blocking_trx_age,
            blocking_trx_rows_modified,
            blocking_trx_rows_locked,
            blocking_lock_mode,
            IFNULL(blocking_query, "") AS blocking_query
        FROM
            sys.innodb_lock_waits;
charles-001 commented 7 months ago

Hi @aadant - thanks for reaching out! I've never seen this query take a long time for me on my busiest servers so that's interesting you are. Any chance you can look at the sys view to determine which table is the culprit? It's possible that information_system.INNOBO_TRX is where the performance hit is.

aadant commented 7 months ago

correct, this view is shipped with MySQL but not all sys views are safe to use in production, repetitively, especially the information_schema ones (cc @lefred )

charles-001 commented 7 months ago

Gotcha. Is there a better way to get this information?

On Mon, Feb 5, 2024, 1:04 PM aadant @.***> wrote:

correct, this view is shipped with MySQL but not all sys views are safe to use in production, especially the information_schema ones (cc @lefred https://github.com/lefred )

— Reply to this email directly, view it on GitHub https://github.com/charles-001/dolphie/issues/45#issuecomment-1927651053, or unsubscribe https://github.com/notifications/unsubscribe-auth/ADFBRUPKUUY2LWX6UEL62D3YSENJ5AVCNFSM6AAAAABC2P4SIKVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTSMRXGY2TCMBVGM . You are receiving this because you commented.Message ID: @.***>

aadant commented 7 months ago

https://dev.mysql.com/doc/mysql-perfschema-excerpt/8.3/en/performance-schema-data-locks-table.html performance_schema is typically safer to run

transaction information can be found here but it does not contain the same information

https://dev.mysql.com/doc/mysql-perfschema-excerpt/8.3/en/performance-schema-events-transactions-current-table.html

MySQL [(none)]> SELECT *        FROM performance_schema.events_transactions_current LIMIT 1\G
*************************** 1. row ***************************
                      THREAD_ID: 1
                       EVENT_ID: 22
                   END_EVENT_ID: 22
                     EVENT_NAME: transaction
                          STATE: COMMITTED
                         TRX_ID: 422211877866008
                           GTID: AUTOMATIC
                  XID_FORMAT_ID: NULL
                      XID_GTRID: NULL
                      XID_BQUAL: NULL
                       XA_STATE: NULL
                         SOURCE: handler.cc:1357
                    TIMER_START: 24555263398000
                      TIMER_END: 24555270944000
                     TIMER_WAIT: 7546000
                    ACCESS_MODE: READ WRITE
                ISOLATION_LEVEL: REPEATABLE READ
                     AUTOCOMMIT: YES
           NUMBER_OF_SAVEPOINTS: 0
NUMBER_OF_ROLLBACK_TO_SAVEPOINT: 0
    NUMBER_OF_RELEASE_SAVEPOINT: 0
          OBJECT_INSTANCE_BEGIN: NULL
               NESTING_EVENT_ID: NULL
             NESTING_EVENT_TYPE: NULL
1 row in set (0.00 sec)
charles-001 commented 7 months ago

I believe we can join to threads table to get the sql text, right? data_locks is an 8.0 table so that panel would no longer be available on 5.7 unless I kept the original implementation (which I can do)

charles-001 commented 7 months ago

Looking more into the data_locks table, it seems it isn't very straight-forward to implement. Do you have an idea of what query to run to retrieve valuable information to display @aadant? Looking at how @lefred does it in his locks MySQL shell plugin, it seems there's a bit of work involved and it isn't fast to poll.

aadant commented 7 months ago

Indeed @lefred’s code also references innodb_trx which is also mentioned in the manual.

i suggest you just time out the query using a maximum query time (10s ?)

https://dev.mysql.com/blog-archive/server-side-select-statement-timeouts

charles-001 commented 7 months ago

That's definitely a hacky fix if you're OK w/ that. It's just that dolphie would constantly be trying to run the query and get killed every time. If we can get a performant query that provides similar results, that'd be great.

charles-001 commented 7 months ago

If we can't get a performant query, maybe a more permanent fix is to add a parameter to disable locks monitoring for environments like yours.

charles-001 commented 7 months ago

It's worth noting that the processlist query uses information_schema.innodb_trx as well which you're not having any issue with.

charles-001 commented 7 months ago

I've decided to turn this query off by default unless the locks panel is open. There will be a new parameter named --historical-locks that will allow users to let the query run when the locks panel isn't open so the data can be saved to its graph. I don't see a reason why it should be always running by default.

charles-001 commented 7 months ago

This has been pushed to v4.1.0

aadant commented 7 months ago

Thanks @charles-001 ! Information_schema queries that run all the time should be replaced with non locking harmless performance_schema queries.

aadant commented 7 months ago

Or not run at all

charles-001 commented 7 months ago

@aadant - I agree, but I'm not sure if there's PFS table(s) that replaces information_schema.innodb_trx entirely. Or maybe there is I just don't know which tables in PFS to join to. If you can come up w/ a query that enables us to take away information_schema.innodb_trx, I'd be open to it.

aadant commented 7 months ago

actually MySQL related bugs like https://bugs.mysql.com/bug.php?id=109539 and https://bugs.mysql.com/bug.php?id=112035

are bad for people using sys.innodb_lock_waits. (cc @lefred ) Fortunately Dolphie is not running them anymore.