bbcarchdev / libcluster

Clustering support library (originally part of anansi)
Apache License 2.0
0 stars 2 forks source link

Deadlock between different instances of Twine #16

Closed cgueret closed 8 years ago

cgueret commented 8 years ago

When configured this way:

;; Cluster configuration cluster-name=twine cluster-verbose=yes environment=testing registry=pgsql://postgres:postgres@postgres/spindle

Three instances of twined can end up waiting for each other. Here is the sample logs from one of those instances, this block gets repeated continuously and nothing else gets logged (no data is processed):

writerd: [Debug] libcluster: SQL query: SELECT "id", "threads" FROM "cluster_node" WHERE "key" = 'twine' AND "env" = 'testing' AND "partition" IS NULL AND "expires" >= '2016-07-11 09:21:01' AND "updated" >= '2016-07-11 09:20:56' writerd: [Debug] libcluster: SQL: re-balancing cluster twine/testing: writerd: [Debug] libcluster: SQL query: SELECT "id", "threads" FROM "cluster_node" WHERE "key" = 'twine' AND "env" = 'testing' AND "partition" IS NULL AND "expires" >= '2016-07-11 09:21:01' ORDER BY "id" ASC writerd: [Debug] * 1a687d3f2c9f451599fb910c52fbebb1 [0] writerd: [Debug] b583f3eb1e8f409283fc27183ef91500 [1] writerd: [Debug] dc516eca7793465d8fcf2e708e45d067 [2] writerd: [Debug] libcluster: SQL: waiting for changes to twine/testing writerd: [Debug] libcluster: SQL query: SELECT "id", "threads" FROM "cluster_node" WHERE "key" = 'twine' AND "env" = 'testing' AND "partition" IS NULL AND "expires" >= '2016-07-11 09:21:06' AND "updated" >= '2016-07-11 09:21:01' writerd: [Debug] libcluster: SQL: waiting for changes to twine/testing writerd: [Debug] libcluster: SQL query: SELECT "id", "threads" FROM "cluster_node" WHERE "key" = 'twine' AND "env" = 'testing' AND "partition" IS NULL AND "expires" >= '2016-07-11 09:21:11' AND "updated" >= '2016-07-11 09:21:06' writerd: [Debug] libcluster: SQL: waiting for changes to twine/testing writerd: [Debug] libcluster: SQL query: SELECT "id", "threads" FROM "cluster_node" WHERE "key" = 'twine' AND "env" = 'testing' AND "partition" IS NULL AND "expires" >= '2016-07-11 09:21:16' AND "updated" >= '2016-07-11 09:21:11' writerd: [Debug] libcluster: SQL: waiting for changes to twine/testing writerd: [Debug] libcluster: SQL query: SELECT "id", "threads" FROM "cluster_node" WHERE "key" = 'twine' AND "env" = 'testing' AND "partition" IS NULL AND "expires" >= '2016-07-11 09:21:21' AND "updated" >= '2016-07-11 09:21:16' writerd: [Debug] libcluster: SQL: waiting for changes to twine/testing writerd: [Debug] libcluster: SQL query: SELECT "id", "threads" FROM "cluster_node" WHERE "key" = 'twine' AND "env" = 'testing' AND "partition" IS NULL AND "expires" >= '2016-07-11 09:21:26' AND "updated" >= '2016-07-11 09:21:21' writerd: [Debug] libcluster: SQL: waiting for changes to twine/testing writerd: [Debug] libcluster: SQL query: START TRANSACTION ISOLATION LEVEL REPEATABLE READ writerd: [Debug] libcluster: SQL query: DELETE FROM "cluster_node" WHERE "id" = '1a687d3f2c9f451599fb910c52fbebb1' AND "key" = 'twine' AND "env" = 'testing' writerd: [Debug] libcluster: SQL query: INSERT INTO "cluster_node" ("id", "key", "partition", "env", "threads", "updated", "expires") VALUES ('1a687d3f2c9f451599fb910c52fbebb1', 'twine', NULL, 'testing', 1, '2016-07-11 09:21:29', '2016-07-11 09:23:29') writerd: [Debug] libcluster: SQL query: COMMIT writerd: [Debug] libcluster: SQL: updated registry with 1a687d3f2c9f451599fb910c52fbebb1=1

nevali commented 8 years ago

These log entries are libcluster working normally; you'll need to gather more information (e.g., twine's queue queries) to see why it might appear to be deadlocked.