cofacts / rumors-api

GraphQL API server for clients like rumors-site and rumors-line-bot
https://api.cofacts.tw
MIT License
110 stars 26 forks source link

"JavaScript heap out of memory" for rumors-api on production site #151

Open MrOrz opened 4 years ago

MrOrz commented 4 years ago

On 2/28 5:03 we had a downtime: image

On rumors-api stdout & stderr, it seems that the system has gone through several OOM error and restarts. image

This log is also reoccurred hundreds of times.

{"log":"\n","stream":"stdout","time":"2020-02-27T10:18:26.890163689Z"}
{"log":"\u003c--- Last few GCs ---\u003e\n","stream":"stdout","time":"2020-02-27T10:18:26.890618195Z"}
{"log":"\n","stream":"stdout","time":"2020-02-27T10:18:26.890630616Z"}
{"log":"[253:0x556bc498a0c0] 14313557 ms: Mark-sweep 2045.5 (2053.3) -\u003e 2045.5 (2053.3) MB, 186.1 / 0.0 ms  (average mu = 0.190, current mu = 0.003) last resort GC in old space requested\n","stream":"stdout","time":"2020-02-27T10:18:26.890635037Z"}
{"log":"[253:0x556bc498a0c0] 14313746 ms: Mark-sweep 2045.5 (2053.3) -\u003e 2045.5 (2053.3) MB, 189.0 / 0.0 ms  (average mu = 0.107, current mu = 0.000) last resort GC in old space requested\n","stream":"stdout","time":"2020-02-27T10:18:26.890639792Z"}
{"log":"\n","stream":"stdout","time":"2020-02-27T10:18:26.890653955Z"}
{"log":"\n","stream":"stdout","time":"2020-02-27T10:18:26.890657677Z"}
{"log":"\u003c--- JS stacktrace ---\u003e\n","stream":"stdout","time":"2020-02-27T10:18:26.890661023Z"}
{"log":"\n","stream":"stdout","time":"2020-02-27T10:18:26.890664707Z"}
{"log":"==== JS stack trace =========================================\n","stream":"stdout","time":"2020-02-27T10:18:26.890668103Z"}
{"log":"\n","stream":"stdout","time":"2020-02-27T10:18:26.890672453Z"}
{"log":"Security context: 0x3f62d4d008d1 \u003cJSObject\u003e\n","stream":"stdout","time":"2020-02-27T10:18:26.890675905Z"}
{"log":"    0: builtin exit frame: parse(this=0x3f62d4d1ec89 \u003cObject map = 0x375559ac3549\u003e,0x3c0268c3aac9 \u003cVery long string[517101925]\u003e,0x3f62d4d1ec89 \u003cObject map = 0x375559ac3549\u003e)\n","stream":"stdout","time":"2020-02-27T10:18:26.890679637Z"}
{"log":"\n","stream":"stdout","time":"2020-02-27T10:18:26.890684086Z"}
{"log":"    1: deserialize [0x12de3bfb2d71] [/srv/www/node_modules/elasticsearch/src/lib/serializers/json.js:~42] [pc=0x1c5017a54005](this=0x3caf862675c9 \u003cJson map = 0x1b1b5e342379\u003e,0x3c0268c3aac9 \u003cVery long string[517101925]\u003e)\n","stream":"stdout","time":"2020-02-27T10:18:26.890687493Z"}
{"log":"   ...\n","stream":"stdout","time":"2020-02-27T10:18:26.890691854Z"}
{"log":"\n","stream":"stdout","time":"2020-02-27T10:18:26.890695282Z"}
{"log":"FATAL ERROR: CALL_AND_RETRY_LAST Allocation failed - JavaScript heap out of memory\n","stream":"stderr","time":"2020-02-27T10:18:26.893190984Z"}
{"log":"2020-02-27T10:18:34: PM2 log: App [rumors-api:1] exited with code [0] via signal [SIGABRT]\n","stream":"stdout","time":"2020-02-27T10:18:34.439108515Z"}
{"log":"2020-02-27T10:18:34: PM2 log: App [rumors-api:1] starting in -cluster mode-\n","stream":"stdout","time":"2020-02-27T10:18:34.484851615Z"}
{"log":"10:18:34 PM2           | App [rumors-api:1] exited with code [0] via signal [SIGABRT]\n","stream":"stdout","time":"2020-02-27T10:18:34.542056858Z"}
{"log":"10:18:34 PM2           | App [rumors-api:1] starting in -cluster mode-\n","stream":"stdout","time":"2020-02-27T10:18:34.54626033Z"}
{"log":"2020-02-27T10:18:35: PM2 log: App [rumors-api:1] online\n","stream":"stdout","time":"2020-02-27T10:18:35.440139913Z"}
{"log":"10:18:35 PM2           | App [rumors-api:1] online\n","stream":"stdout","time":"2020-02-27T10:18:35.441326366Z"}
{"log":"10:18:39 1|rumors-api  | Elasticsearch INFO: 2020-02-27T10:18:39Z\n","stream":"stdout","time":"2020-02-27T10:18:39.682718212Z"}
{"log":"10:18:39 1|rumors-api  |   Adding connection to http://db:9200/\n","stream":"stdout","time":"2020-02-27T10:18:39.683382154Z"}
{"log":"10:18:41 1|rumors-api  | Listening port 5000\n","stream":"stdout","time":"2020-02-27T10:18:41.294926111Z"}
{"log":"\n","stream":"stdout","time":"2020-02-27T10:19:07.04858235Z"}

See the full logs here: https://gist.github.com/MrOrz/63019de0f65fd320964770dd64bebbe0

MrOrz commented 4 years ago

Regarding the "Very long string" error emitted by elasticsearch and triggers OOM, I highly suspect that it is because rumors-api is trying to access html field in urls index, and that triggers such error.

However we can never be sure and further investigation is needed (such as analyzing field size of each field in elasticsearch) to develop a step-to-reproduce, and then a solution (like blacklisting certain fields for searchResultLoader and docLoader) can be applied.