Expensify / Bedrock

Rock solid distributed database specializing in active/active automatic failover and WAN replication
https://bedrockdb.com
GNU Lesser General Public License v3.0
1.11k stars 83 forks source link

Probably a missconfig #1637

Open izytechAB opened 9 months ago

izytechAB commented 9 months ago

We have set up an cluster inside of an docker swarm. Starting 3 nodes in a bedrock cluster. We are having some issues when we are doing insert on nodes. it works great on 1 or two nodes and on the third it fails and we have to restart the bedrock server on that node to get it responsive again. No clear pattern on which of the bedrock node that fails. Could it be that we have to bind server host and node host to specific a ip-address as there are multiple network interfaces?


bedrock -nodeName e434ef143cd8 -db /var/lib/bedrock/bedrock.db -serverHost 0.0.0.0:8888 -nodeHost 0.0.0.0:8889 -priority 397 -quorumCheckpoint 1 -readThreads 4 -peerList cb0d1ebf6c06:8889,121ec6fd20dc:8889 -workerThreads 4 -plugins db,cache,jobs -enableMultiWrite false

bedrock -nodeName 121ec6fd20dc -db /var/lib/bedrock/bedrock.db -serverHost 0.0.0.0:8888 -nodeHost 0.0.0.0:8889 -priority 480 -quorumCheckpoint 1 -readThreads 4 -peerList e434ef143cd8:8889,cb0d1ebf6c06:8889 -workerThreads 4 -plugins db,cache,jobs -enableMultiWrite false

bedrock -nodeName cb0d1ebf6c06 -db /var/lib/bedrock/bedrock.db -serverHost 0.0.0.0:8888 -nodeHost 0.0.0.0:8889 -priority 364 -quorumCheckpoint 1 -readThreads 4 -peerList e434ef143cd8:8889,121ec6fd20dc:8889 -workerThreads 4 -plugins db,cache,jobs -enableMultiWrite false

On one node Query: CREATE TABLE t (extid TEXT, name TEXT);

On the second node
Query: INSERT INTO t (extid, name) VALUES ('11120', 'hello');

500 Internal Server Error
escalationTime: 20557263
nodeName: e434ef143cd8
peekTime: 112
totalTime: 110039731
unaccountedTime: 89482179
Content-Length: 0
92-dev   | <14>Jan 11 11:26:40 bedrock: T5F1ov (BedrockServer.cpp:2193) buildCommandFromRequest [socket1089] [info] Waiting for 'Query: INSERT INTO t (extid, name) VALUES ('11120', 'hello');' to complete.
92-dev   | <14>Jan 11 11:26:40 bedrock: xxxxxx (BedrockServer.cpp:2331) handleSocket [socket1089] [info] Running new 'Query: INSERT INTO t (extid, name) VALUES ('11120', 'hello');' command from local client, with 0 commands already queued.
92-dev   | <14>Jan 11 11:26:40 bedrock: T5F1ov (SQLitePool.cpp:63) getIndex [socket1089] [info] Waiting for DB handle
92-dev   | <14>Jan 11 11:26:40 bedrock: lbEFYi (SQLiteClusterMessenger.cpp:55) **waitForReady [socket930] [info] [HTTPESC] Timeout waiting for socket.**
izytechAB commented 9 months ago

I was really hoping for some hints where to search for a solution. However there seems not to be any communication problems between the bedrock nodes as far as I can tell.

clifinger commented 2 days ago

I have a hint, as I remember I had the same problem with Loki (and tempo) from Grafana.

You should give a "fixed IP" to the container.

networks:
      the_network:
        - ipv4_address: 10.20.30.42

or

# docker-entrypoint.sh

export bindAddr=ifconfig | grep inet.addr.${whatever-i-was} | cut -d: -f2 | awk '{print $1}'

and use the variable in your command line to start bedrock

-serverHost ${bindAddr}:8888 -nodeHost ${bindAddr}:8889