ether / etherpad-lite

Etherpad: A modern really-real-time collaborative document editor.
http://docs.etherpad.org/
Apache License 2.0
16.15k stars 2.8k forks source link

Etherpad still crashs when when querying stats API with timeout #5343

Open 0x46616c6b opened 2 years ago

0x46616c6b commented 2 years ago

Describe the bug

This bug report relates to #5005.

If we try to fetch the getStats API after Etherpad starts, the whole process get killed (see the additional context). In #5005 we saw Error: Query inactivity timeout. Now it is Error: Request aborted. Nevertheless the process should not be terminated when this happens.

Our Etherpad instance runs around 20k pads with a MySQL storage backend.

To Reproduce Steps to reproduce the behavior:

  1. Start Etherpad
  2. Trigger the API Endpoint getStats with a timeout (60 seconds)
    curl -v --connect-timeout 60 --max-time 60 -s "localhost:9001/api/1.2.14/getStats?apikey=xxx"

Expected behavior

Etherpad should not crash if the HTTP Client closes the connection to the API Endpoint.

Server (please complete the following information):

Additional context

Dec 30 21:26:40 etherpad1 node[19662]: [2021-12-30 21:26:40.836] [ERROR] console - Error: Request aborted
Dec 30 21:26:40 etherpad1 node[19662]:     at onaborted (/home/etherpad/etherpad-lite/src/node_modules/express/lib/response.js:1025:15)
Dec 30 21:26:40 etherpad1 node[19662]:     at Immediate.<anonymous> (/home/etherpad/etherpad-lite/src/node_modules/express/lib/response.js:1067:9)
Dec 30 21:26:40 etherpad1 node[19662]:     at processImmediate (internal/timers.js:464:21)
Dec 30 21:26:56 etherpad1 node[19662]: [2021-12-30 21:26:56.353] [ERROR] console - Fatal MySQL error: Error: Query inactivity timeout
Dec 30 21:26:56 etherpad1 node[19662]:     at Query.<anonymous> (/home/etherpad/etherpad-lite/src/node_modules/mysql/lib/protocol/Protocol.js:160:17)
Dec 30 21:26:56 etherpad1 node[19662]:     at Query.emit (events.js:400:28)
Dec 30 21:26:56 etherpad1 node[19662]:     at Query._onTimeout (/home/etherpad/etherpad-lite/src/node_modules/mysql/lib/protocol/sequences/Sequence.js:124:8)
Dec 30 21:26:56 etherpad1 node[19662]:     at Timer._onTimeout (/home/etherpad/etherpad-lite/src/node_modules/mysql/lib/protocol/Timer.js:32:23)
Dec 30 21:26:56 etherpad1 node[19662]:     at listOnTimeout (internal/timers.js:557:17)
Dec 30 21:26:56 etherpad1 node[19662]:     at processTimers (internal/timers.js:500:7)
Dec 30 21:26:56 etherpad1 node[19662]:     --------------------
Dec 30 21:26:56 etherpad1 node[19662]:     at Pool.query (/home/etherpad/etherpad-lite/src/node_modules/mysql/lib/Pool.js:199:23)
Dec 30 21:26:56 etherpad1 node[19662]:     at /home/etherpad/etherpad-lite/src/node_modules/ueberdb2/databases/mysql_db.js:46:20
Dec 30 21:26:56 etherpad1 node[19662]:     at new Promise (<anonymous>)
Dec 30 21:26:56 etherpad1 node[19662]:     at exports.Database._query (/home/etherpad/etherpad-lite/src/node_modules/ueberdb2/databases/mysql_db.js:44:20)
Dec 30 21:26:56 etherpad1 node[19662]:     at exports.Database.set (/home/etherpad/etherpad-lite/src/node_modules/ueberdb2/databases/mysql_db.js:143:16)
Dec 30 21:26:56 etherpad1 node[19662]:     at writeOneOp (/home/etherpad/etherpad-lite/src/node_modules/ueberdb2/lib/CacheAndBufferLayer.js:544:34)
Dec 30 21:26:56 etherpad1 node[19662]:     at exports.Database._write (/home/etherpad/etherpad-lite/src/node_modules/ueberdb2/lib/CacheAndBufferLayer.js:555:13)
Dec 30 21:26:56 etherpad1 node[19662]:     at /home/etherpad/etherpad-lite/src/node_modules/ueberdb2/lib/CacheAndBufferLayer.js:500:22
Dec 30 21:26:56 etherpad1 node[19662]:     at exports.Database.flush (/home/etherpad/etherpad-lite/src/node_modules/ueberdb2/lib/CacheAndBufferLayer.js:502:9)
Dec 30 21:26:56 etherpad1 node[19662]:     at Timeout._onTimeout (/home/etherpad/etherpad-lite/src/node_modules/ueberdb2/lib/CacheAndBufferLayer.js:228:32)
Dec 30 21:26:56 etherpad1 node[19662]: [2021-12-30 21:26:56.355] [ERROR] socket.io - Error while handling message from jiMZz_e-ZzKz-t4aAAAC: Error: Query inactivity timeout
Dec 30 21:26:56 etherpad1 node[19662]:     at Query.<anonymous> (/home/etherpad/etherpad-lite/src/node_modules/mysql/lib/protocol/Protocol.js:160:17)
Dec 30 21:26:56 etherpad1 node[19662]:     at Query.emit (events.js:400:28)
Dec 30 21:26:56 etherpad1 node[19662]:     at Query._onTimeout (/home/etherpad/etherpad-lite/src/node_modules/mysql/lib/protocol/sequences/Sequence.js:124:8)
Dec 30 21:26:56 etherpad1 node[19662]:     at Timer._onTimeout (/home/etherpad/etherpad-lite/src/node_modules/mysql/lib/protocol/Timer.js:32:23)
Dec 30 21:26:56 etherpad1 node[19662]:     at listOnTimeout (internal/timers.js:557:17)
Dec 30 21:26:56 etherpad1 node[19662]:     at processTimers (internal/timers.js:500:7)
Dec 30 21:26:56 etherpad1 node[19662]:     --------------------
Dec 30 21:26:56 etherpad1 node[19662]:     at Pool.query (/home/etherpad/etherpad-lite/src/node_modules/mysql/lib/Pool.js:199:23)
Dec 30 21:26:56 etherpad1 node[19662]:     at /home/etherpad/etherpad-lite/src/node_modules/ueberdb2/databases/mysql_db.js:46:20
Dec 30 21:26:56 etherpad1 node[19662]:     at new Promise (<anonymous>)
Dec 30 21:26:56 etherpad1 node[19662]:     at exports.Database._query (/home/etherpad/etherpad-lite/src/node_modules/ueberdb2/databases/mysql_db.js:44:20)
Dec 30 21:26:56 etherpad1 node[19662]:     at exports.Database.set (/home/etherpad/etherpad-lite/src/node_modules/ueberdb2/databases/mysql_db.js:143:16)
Dec 30 21:26:56 etherpad1 node[19662]:     at writeOneOp (/home/etherpad/etherpad-lite/src/node_modules/ueberdb2/lib/CacheAndBufferLayer.js:544:34)
Dec 30 21:26:56 etherpad1 node[19662]:     at exports.Database._write (/home/etherpad/etherpad-lite/src/node_modules/ueberdb2/lib/CacheAndBufferLayer.js:555:13)
Dec 30 21:26:56 etherpad1 node[19662]:     at /home/etherpad/etherpad-lite/src/node_modules/ueberdb2/lib/CacheAndBufferLayer.js:500:22
Dec 30 21:26:56 etherpad1 node[19662]:     at exports.Database.flush (/home/etherpad/etherpad-lite/src/node_modules/ueberdb2/lib/CacheAndBufferLayer.js:502:9)
Dec 30 21:26:56 etherpad1 node[19662]:     at Timeout._onTimeout (/home/etherpad/etherpad-lite/src/node_modules/ueberdb2/lib/CacheAndBufferLayer.js:228:32)
Dec 30 21:26:56 etherpad1 node[19662]: [2021-12-30 21:26:56.813] [ERROR] console - Fatal MySQL error: Error: Query inactivity timeout
Dec 30 21:26:56 etherpad1 node[19662]:     at Query.<anonymous> (/home/etherpad/etherpad-lite/src/node_modules/mysql/lib/protocol/Protocol.js:160:17)
Dec 30 21:26:56 etherpad1 node[19662]:     at Query.emit (events.js:400:28)
Dec 30 21:26:56 etherpad1 node[19662]:     at Query._onTimeout (/home/etherpad/etherpad-lite/src/node_modules/mysql/lib/protocol/sequences/Sequence.js:124:8)
Dec 30 21:26:56 etherpad1 node[19662]:     at Timer._onTimeout (/home/etherpad/etherpad-lite/src/node_modules/mysql/lib/protocol/Timer.js:32:23)
Dec 30 21:26:56 etherpad1 node[19662]:     at listOnTimeout (internal/timers.js:557:17)
Dec 30 21:26:56 etherpad1 node[19662]:     at processTimers (internal/timers.js:500:7)
Dec 30 21:26:56 etherpad1 node[19662]:     --------------------
Dec 30 21:26:56 etherpad1 node[19662]:     at Pool.query (/home/etherpad/etherpad-lite/src/node_modules/mysql/lib/Pool.js:199:23)
Dec 30 21:26:56 etherpad1 node[19662]:     at /home/etherpad/etherpad-lite/src/node_modules/ueberdb2/databases/mysql_db.js:46:20
Dec 30 21:26:56 etherpad1 node[19662]:     at new Promise (<anonymous>)
Dec 30 21:26:56 etherpad1 node[19662]:     at exports.Database._query (/home/etherpad/etherpad-lite/src/node_modules/ueberdb2/databases/mysql_db.js:44:20)
Dec 30 21:26:56 etherpad1 node[19662]:     at exports.Database.get (/home/etherpad/etherpad-lite/src/node_modules/ueberdb2/databases/mysql_db.js:116:34)
Dec 30 21:26:56 etherpad1 node[19662]:     at exports.Database._getLocked (/home/etherpad/etherpad-lite/src/node_modules/ueberdb2/lib/CacheAndBufferLayer.js:294:38)
Dec 30 21:26:56 etherpad1 node[19662]:     at exports.Database.get (/home/etherpad/etherpad-lite/src/node_modules/ueberdb2/lib/CacheAndBufferLayer.js:271:25)
Dec 30 21:26:56 etherpad1 node[19662]:     at runMicrotasks (<anonymous>)
Dec 30 21:26:56 etherpad1 node[19662]:     at processTicksAndRejections (internal/process/task_queues.js:95:5)
rhansen commented 2 years ago

There are two issues here that I'd like to address separately:

  1. Etherpad shouldn't crash if a DB query throws. This should be easy to fix.
  2. The query shouldn't time out. This is the same problem as: https://github.com/ether/ep_adminpads2/issues/25#issuecomment-986198640. Unfortunately, fixing this is delicate work because it involves changing how pads are saved to the database. It will likely break some plugins.
rhansen commented 2 years ago
  1. Etherpad shouldn't crash if a DB query throws. This should be easy to fix.

Sigh, I jinxed myself by saying that. It's not easy to fix. IIUC, what's happening is the MySQL server is so busy trying to enumerate all pads that unrelated queries over other pool connections are timing out.

0x46616c6b commented 2 years ago

Thanks a lot for your constant work of improvement and support to help for our issues we have.

Will #5347 apply in the next release? We would like to test it and give a feedback.

rhansen commented 2 years ago

Yes, it will be in the next release (1.9.0). Unfortunately, I think there's a low probability that PR will fix the issue for you. (It might, so it's still worth trying.)

0x46616c6b commented 2 years ago

Okay, we keep trying and still hope 🤞

leesha19 commented 2 years ago

Okay, we keep trying and still hope 🤞

hey i would like to know one thing....if we contribute here is etherpad going to pay us ...and if it then whats the criteria how many pull req

stale[bot] commented 2 years ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

0x46616c6b commented 2 years ago

We tried another time to get deeper in this problem. As we are using MariaDB/MySQL for Storage with MyISAM engine we looked what ueberDB would trigger in the database and did this:

MariaDB [etherpad]> SELECT COUNT(`key`) FROM store WHERE `key` LIKE 'pad:%' AND `key` NOT LIKE 'pad:%:%';
+--------------+
| COUNT(`key`) |
+--------------+
|        18924 |
+--------------+
1 row in set (51.514 sec)

After some improvements for our MariaDB instance we reduced the query time from ~90 seconds to ~53 seconds. But we were still not satisfied. So we noticed that we have ~18k Pads but 100.000.000 rows in MariaDB. So we stumble upon an old problem, that the sessionstorage still rapidly growths:

MariaDB [etherpad]> SELECT COUNT(*) FROM store WHERE `key` LIKE 'sessionstorage%';
+----------+
| COUNT(*) |
+----------+
| 38923592 |
+----------+
1 row in set (5 min 44.863 sec)

We cleaned around one year ago all entries starting with sessionstorage from the database and after that year 40% of the database is used by this entries.

We will again clean the database from this entries and waiting for 1.9.0 release which hopefully fix the session creation. After we cleaned up the database I will bring new timings of our queries to compare.

stale[bot] commented 1 year ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

xshadow commented 1 year ago

Not stale.