Open 7yl4r opened 5 years ago
[root@imars-airflow01 ~]# innotop -u groooooot -p **************
Uptime MaxSQL ReplLag QPS Cxns Run Miss Lock Tbls Repl SQL
33d 1s 612.84 43 22 3518.61 0 175 SELECT dag_run
MariaDB [(none)]> SHOW ENGINE INNODB STATUS;
| InnoDB | |
=====================================
2018-11-20 18:52:48 0x7f0080480700 INNODB MONITOR OUTPUT
=====================================
Per second averages calculated from the last 4 seconds
-----------------
BACKGROUND THREAD
-----------------
srv_master_thread loops: 2624879 srv_active, 0 srv_shutdown, 78414 srv_idle
srv_master_thread log flush and writes: 2703261
----------
SEMAPHORES
----------
OS WAIT ARRAY INFO: reservation count 632835654
--Thread 139640395204352 has waited at btr0sea.cc line 1160 for 0.00 seconds the semaphore:
S-lock on RW-latch at 0x55765bdc9500 created in file btr0sea.cc line 193
a writer (thread id 139640128022272) has reserved it in mode exclusive
number of readers 0, waiters flag 0, lock_word: 0
Last time read locked in file btr0sea.cc line 1160
Last time write locked in file btr0sea.cc line 1258
OS WAIT ARRAY INFO: signal count 458431058
RW-shared spins 0, rounds 742367799, OS waits 377507965
RW-excl spins 0, rounds 5252224502, OS waits 170048114
RW-sx spins 212908, rounds 6053066, OS waits 146449
Spin rounds per wait: 742367799.00 RW-shared, 5252224502.00 RW-excl, 28.43 RW-sx
------------------------
LATEST DETECTED DEADLOCK
------------------------
2018-11-20 18:48:45 0x7f007bdf9700
*** (1) TRANSACTION:
TRANSACTION 421115762964928, ACTIVE 0 sec starting index read
mysql tables in use 3, locked 3
LOCK WAIT 3 lock struct(s), heap size 1136, 2 row lock(s)
MySQL thread id 69359996, OS thread handle 139640052553472, query id 1755671629 localhost airflow_user Sending data
UPDATE task_instance, (SELECT task_instance.try_number AS try_number, task_instance.task_id AS task_id, task_instance.dag_id AS dag_id, task_instance.execution_date AS execution_date, task_instance.start_date AS start_date, task_instance.end_date AS end_date, task_instance.duration AS duration, task_instance.state AS state, task_instance.max_tries AS max_tries, task_instance.hostname AS hostname, task_instance.unixname AS unixname, task_instance.job_id AS job_id, task_instance.pool AS pool, task_instance.queue AS queue, task_instance.priority_weight AS priority_weight, task_instance.operator AS operator, task_instance.queued_dttm AS queued_dttm, task_instance.pid AS pid, task_instance.executor_config AS executor_config
FROM task_instance LEFT OUTER JOIN dag_run ON task_instance.dag_id = dag_run.dag_id AND task_instance.execution_date = dag_run.execution_date
WHERE task_instance.dag_id IN ('in
*** (1) WAITING FOR THIS LOCK TO BE GRANTED:
RECORD LOCKS space id 723 page no 181873 n bits 144 index PRIMARY of table `airflow`.`task_instance` trx id 421115762964928 lock mode S locks rec but not gap waiting
Record lock, heap no 73 PHYSICAL RECORD: n_fields 21; compact format; info bits 0
0: len 16; hex 646f776e6c6f61645f6772616e756c65; asc download_granule;;
1: len 30; hex 696e676573745f7333615f6f6c5f315f6566725f646875735f636f766572; asc ingest_s3a_ol_1_efr_dhus_cover; (total 43 bytes);
2: len 7; hex 5bdcf310000000; asc [ ;;
3: len 6; hex 00005ef10ede; asc ^ ;;
4: len 7; hex 35000001532125; asc 5 S!%;;
5: len 7; hex 5bf4570d011cee; asc [ W ;;
6: SQL NULL;
7: SQL NULL;
8: len 7; hex 72756e6e696e67; asc running;;
9: len 4; hex 80000001; asc ;;
10: len 30; hex 696d6172732d616972666c6f7731352e6d6172696e652e7573662e656475; asc imars-airflow15.marine.usf.edu;;
11: len 7; hex 616972666c6f77; asc airflow;;
12: len 4; hex 80169bb8; asc ;;
13: SQL NULL;
14: len 7; hex 64656661756c74; asc default;;
15: len 4; hex 80000003; asc ;;
16: len 20; hex 446f776e6c6f616446696c654f70657261746f72; asc DownloadFileOperator;;
17: len 7; hex 5bf457060df3f9; asc [ W ;;
18: len 4; hex 80001923; asc #;;
19: len 4; hex 80000000; asc ;;
20: len 14; hex 80049503000000000000007d942e; asc } .;;
*** (2) TRANSACTION:
TRANSACTION 1592856286, ACTIVE 0 sec updating or deleting
mysql tables in use 1, locked 1
3 lock struct(s), heap size 1136, 2 row lock(s), undo log entries 1
MySQL thread id 69393678, OS thread handle 139640054978304, query id 1755671669 imars-airflow15.marine.usf.edu 192.168.1.224 airflow_user updating
UPDATE task_instance SET start_date='2018-11-20 18:48:45.072942', state='running', try_number=1, hostname='imars-airflow15.marine.usf.edu', job_id=1481656, operator='DownloadFileOperator', pid=6435 WHERE task_instance.task_id = 'download_granule' AND task_instance.dag_id = 'ingest_s3a_ol_1_efr_dhus_coverage_check_gom' AND task_instance.execution_date = '2018-11-03 01:00:00'
*** (2) HOLDS THE LOCK(S):
RECORD LOCKS space id 723 page no 181873 n bits 144 index PRIMARY of table `airflow`.`task_instance` trx id 1592856286 lock_mode X locks rec but not gap
Record lock, heap no 73 PHYSICAL RECORD: n_fields 21; compact format; info bits 0
0: len 16; hex 646f776e6c6f61645f6772616e756c65; asc download_granule;;
1: len 30; hex 696e676573745f7333615f6f6c5f315f6566725f646875735f636f766572; asc ingest_s3a_ol_1_efr_dhus_cover; (total 43 bytes);
2: len 7; hex 5bdcf310000000; asc [ ;;
3: len 6; hex 00005ef10ede; asc ^ ;;
4: len 7; hex 35000001532125; asc 5 S!%;;
5: len 7; hex 5bf4570d011cee; asc [ W ;;
6: SQL NULL;
7: SQL NULL;
8: len 7; hex 72756e6e696e67; asc running;;
9: len 4; hex 80000001; asc ;;
10: len 30; hex 696d6172732d616972666c6f7731352e6d6172696e652e7573662e656475; asc imars-airflow15.marine.usf.edu;;
11: len 7; hex 616972666c6f77; asc airflow;;
12: len 4; hex 80169bb8; asc ;;
13: SQL NULL;
14: len 7; hex 64656661756c74; asc default;;
15: len 4; hex 80000003; asc ;;
16: len 20; hex 446f776e6c6f616446696c654f70657261746f72; asc DownloadFileOperator;;
17: len 7; hex 5bf457060df3f9; asc [ W ;;
18: len 4; hex 80001923; asc #;;
19: len 4; hex 80000000; asc ;;
20: len 14; hex 80049503000000000000007d942e; asc } .;;
*** (2) WAITING FOR THIS LOCK TO BE GRANTED:
RECORD LOCKS space id 723 page no 171327 n bits 296 index ti_state of table `airflow`.`task_instance` trx id 1592856286 lock_mode X locks rec but not gap waiting
Record lock, heap no 229 PHYSICAL RECORD: n_fields 4; compact format; info bits 0
0: len 6; hex 717565756564; asc queued;;
1: len 16; hex 646f776e6c6f61645f6772616e756c65; asc download_granule;;
2: len 30; hex 696e676573745f7333615f6f6c5f315f6566725f646875735f636f766572; asc ingest_s3a_ol_1_efr_dhus_cover; (total 43 bytes);
3: len 7; hex 5bdcf310000000; asc [ ;;
*** WE ROLL BACK TRANSACTION (1)
------------
TRANSACTIONS
------------
Trx id counter 1592874985
Purge done for trx's n:o < 1592874545 undo n:o < 0 state: running but idle
History list length 93
LIST OF TRANSACTIONS FOR EACH SESSION:
---TRANSACTION 421115762987176, not started
0 lock struct(s), heap size 1136, 0 row lock(s)
---TRANSACTION 421115762975640, not started
0 lock struct(s), heap size 1136, 0 row lock(s)
---TRANSACTION 421115762973168, not started
0 lock struct(s), heap size 1136, 0 row lock(s)
---TRANSACTION 421115762968224, not started
0 lock struct(s), heap size 1136, 0 row lock(s)
---TRANSACTION 421115762988000, not started
0 lock struct(s), heap size 1136, 0 row lock(s)
---TRANSACTION 421115762978936, not started
0 lock struct(s), heap size 1136, 0 row lock(s)
---TRANSACTION 421115762964928, not started
0 lock struct(s), heap size 1136, 0 row lock(s)
---TRANSACTION 421115762978112, not started
0 lock struct(s), heap size 1136, 0 row lock(s)
---TRANSACTION 421115762986352, not started
0 lock struct(s), heap size 1136, 0 row lock(s)
---TRANSACTION 421115763000360, not started
0 lock struct(s), heap size 1136, 0 row lock(s)
---TRANSACTION 421115762979760, not started
0 lock struct(s), heap size 1136, 0 row lock(s)
---TRANSACTION 421115762980584, not started
0 lock struct(s), heap size 1136, 0 row lock(s)
---TRANSACTION 421115762992120, not started
0 lock struct(s), heap size 1136, 0 row lock(s)
---TRANSACTION 421115762964104, not started
0 lock struct(s), heap size 1136, 0 row lock(s)
---TRANSACTION 421115762976464, not started
0 lock struct(s), heap size 1136, 0 row lock(s)
---TRANSACTION 421115762969048, not started
0 lock struct(s), heap size 1136, 0 row lock(s)
---TRANSACTION 421115762973992, not started
0 lock struct(s), heap size 1136, 0 row lock(s)
---TRANSACTION 421115762965752, not started
0 lock struct(s), heap size 1136, 0 row lock(s)
---TRANSACTION 421115762977288, not started
0 lock struct(s), heap size 1136, 0 row lock(s)
---TRANSACTION 421115762982232, not started
0 lock struct(s), heap size 1136, 0 row lock(s)
---TRANSACTION 421115762983056, not started
0 lock struct(s), heap size 1136, 0 row lock(s)
---TRANSACTION 421115762969872, not started
0 lock struct(s), heap size 1136, 0 row lock(s)
---TRANSACTION 421115762967400, not started
0 lock struct(s), heap size 1136, 0 row lock(s)
---TRANSACTION 421115762974816, not started
0 lock struct(s), heap size 1136, 0 row lock(s)
---TRANSACTION 421115762970696, not started
0 lock struct(s), heap size 1136, 0 row lock(s)
---TRANSACTION 421115762966576, not started
0 lock struct(s), heap size 1136, 0 row lock(s)
---TRANSACTION 421115762963280, not started
0 lock struct(s), heap size 1136, 0 row lock(s)
--------
FILE I/O
--------
I/O thread 0 state: waiting for completed aio requests (insert buffer thread)
I/O thread 1 state: waiting for completed aio requests (log thread)
I/O thread 2 state: waiting for completed aio requests (read thread)
I/O thread 3 state: waiting for completed aio requests (read thread)
I/O thread 4 state: waiting for completed aio requests (read thread)
I/O thread 5 state: waiting for completed aio requests (read thread)
I/O thread 6 state: waiting for completed aio requests (write thread)
I/O thread 7 state: waiting for completed aio requests (write thread)
I/O thread 8 state: waiting for completed aio requests (write thread)
I/O thread 9 state: waiting for completed aio requests (write thread)
Pending normal aio reads: [0, 0, 9, 0] , aio writes: [0, 0, 0, 0] ,
ibuf aio reads:, log i/o's:, sync i/o's:
Pending flushes (fsync) log: 0; buffer pool: 0
11104708346 OS file reads, 176504068 OS file writes, 45070174 OS fsyncs
6396.65 reads/s, 16384 avg bytes/read, 55.24 writes/s, 20.00 fsyncs/s
-------------------------------------
INSERT BUFFER AND ADAPTIVE HASH INDEX
-------------------------------------
Ibuf: size 1, free list len 53, seg size 55, 5374120 merges
merged operations:
insert 5010933, delete mark 1025547, delete 42131
discarded operations:
insert 0, delete mark 0, delete 0
Hash table size 34679, node heap has 1 buffer(s)
Hash table size 34679, node heap has 3 buffer(s)
Hash table size 34679, node heap has 1 buffer(s)
Hash table size 34679, node heap has 1 buffer(s)
Hash table size 34679, node heap has 2 buffer(s)
Hash table size 34679, node heap has 324 buffer(s)
Hash table size 34679, node heap has 1 buffer(s)
Hash table size 34679, node heap has 1 buffer(s)
18096.48 hash searches/s, 54020.49 non-hash searches/s
---
LOG
---
Log sequence number 132878530538
Log flushed up to 132878530538
Pages flushed up to 132876960289
Last checkpoint at 132876960289
0 pending log flushes, 0 pending chkp writes
21406240 log i/o's done, 17.00 log i/o's/second
----------------------
BUFFER POOL AND MEMORY
----------------------
Total large memory allocated 137560064
Dictionary memory allocated 177092
Buffer pool size 8192
Free buffers 0
Database pages 7854
Old database pages 2908
Modified db pages 319
Percent of dirty pages(LRU & free pages): 4.061
Max dirty pages percent: 75.000
Pending reads 10
Pending writes: LRU 0, flush list 0, single page 0
Pages made young 194827150, not young 69524679810
110.22 youngs/s, 140865.28 non-youngs/s
Pages read 11104744344, created 865535, written 147366735
6277.93 reads/s, 0.75 creates/s, 37.24 writes/s
Buffer pool hit rate 977 / 1000, young-making rate 0 / 1000 not 506 / 1000
Pages read ahead 1040.99/s, evicted without access 19.00/s, Random read ahead 0.00/s
LRU len: 7854, unzip_LRU len: 0
I/O sum[386069]:cur[5374], unzip sum[0]:cur[0]
--------------
ROW OPERATIONS
--------------
0 queries inside InnoDB, 0 queries in queue
4 read views open inside InnoDB
Process ID=1573, Main thread ID=139640414414592, state: sleeping
Number of rows inserted 3259253, updated 34094776, deleted 1186883, read 97988097369
1.50 inserts/s, 23.49 updates/s, 0.00 deletes/s, 70184.20 reads/s
Number of system rows inserted 0, updated 0, deleted 0, read 0
0.00 inserts/s, 0.00 updates/s, 0.00 deletes/s, 0.00 reads/s
----------------------------
END OF INNODB MONITOR OUTPUT
============================
|
mmmmmkay... :man_shrugging:
Metrics on this server hit the db only 1/10min, so that is not it. It may be that the tables are getting big, the RAM allocation is small, or other settings.
mysqld on the airflow-master is using > 50% of disk i/o on the host system (according to
iotop
). This is causing lockups in the VMs and putting a lot of stress on the physical disk. There may be some airflow setting to improve this, or perhaps my metrics are hitting it too hard.If this can't be improved the mysql db may need to be moved to a separate server or dedicated disk.