Open Source - Visualizations crashing server

thedeviousdev commented 5 years ago

What version were you using?

Open Source - v3.13.0

What steps can we follow to reproduce the behaviour?

Server configuration: OS: Ubuntu 18.10 x64 CPU: 1 vCore RAM: 2048 MB Storage: 55 GB SSD

Install using curl -o- -L http://lrnloc.kr/installv2 > deployll.sh && bash deployll.sh as root user.
Use default settings for everything except MongoDB. Do not install.
Update the .env file to use a mongo atlas shard configuration. (The mongo db has previous xAPI statements and dashboard visualizations saved)
Navigate to the Learning Locker UI.
Open up a dashboard

What is the actual behaviour?

In a 3 viz dashboard, the server will send continuous API requests and never fully finish loading. For example, one of visualizations will eventually show, but after about 2 minutes, the page refreshes and resets the visualizations again. Mongo Atlas is definitely receiving the requests, as it maxes out on Disk I/O every time.

Eventually, after leaving it for about 5-10 minutes, a new error pops up in the Dev Console:

vendor.9f9e84ab.js:1 Unhandled rejection TypeError: Cannot read property 'startedAt' of undefined
at a (https://example.com/client.9b3d451b.js:1:43969)
at https://example.com/client.9b3d451b.js:1:46984
at r (https://example.com/vendor.9f9e84ab.js:1:459788)
at Generator._invoke (https://example.com/vendor.9f9e84ab.js:1:460908)
at Generator.t.<computed> [as next] (https://example.com/vendor.9f9e84ab.js:1:459967)
at l (https://example.com/vendor.9f9e84ab.js:1:395528)From previous event:
at i.p [as _captureStackTrace] (https://example.com/vendor.9f9e84ab.js:1:276848)
at i._resolveFromExecutor (https://example.com/vendor.9f9e84ab.js:1:312719)
at new i (https://example.com/vendor.9f9e84ab.js:1:305002)
at https://example.com/client.9b3d451b.js:1:118273
at https://example.com/vendor.9f9e84ab.js:1:928219
at dispatch (https://example.com/vendor.9f9e84ab.js:1:383582)
at https://example.com/vendor.9f9e84ab.js:1:201970
at https://example.com/vendor.9f9e84ab.js:1:397698
at r (https://example.com/vendor.9f9e84ab.js:1:497009)
at u (https://example.com/vendor.9f9e84ab.js:1:497159)
at o (https://example.com/vendor.9f9e84ab.js:1:497058)
at k (https://example.com/vendor.9f9e84ab.js:1:397656)
at d (https://example.com/vendor.9f9e84ab.js:1:396778)
at l (https://example.com/vendor.9f9e84ab.js:1:395591)
at n (https://example.com/vendor.9f9e84ab.js:1:396268)

Looking at the api_stderr-0.log, I see errors like this:

{ message: [32m'Ready check failed: ERR max number of clients reached'[39m,
  stack: [32m'Error: Ready check failed: ERR max number of clients reached\n
  at /usr/local/learninglocker/releases/ll-20190531-8490de5f362eb1ca7368fab00f75c81fecfd680f_1/webapp/node_modules/ioredis/lib/redis/event_handler.js:45:29\n
  at /usr/local/learninglocker/releases/ll-20190531-8490de5f362eb1ca7368fab00f75c81fecfd680f_1/webapp/node_modules/ioredis/lib/redis.js:412:14\n
  at tryCatcher (/usr/local/learninglocker/releases/ll-20190531-8490de5f362eb1ca7368fab00f75c81fecfd680f_1/webapp/node_modules/bluebird/js/release/util.js:16:23)\n
  at Promise.errorAdapter [as _rejectionHandler0] (/usr/local/learninglocker/releases/ll-20190531-8490de5f362eb1ca7368fab00f75c81fecfd680f_1/webapp/node_modules/bluebird/js/release/nodeify.js:35:34)\n
  at Promise._settlePromise (/usr/local/learninglocker/releases/ll-20190531-8490de5f362eb1ca7368fab00f75c81fecfd680f_1/webapp/node_modules/bluebird/js/release/promise.js:566:21)\n
  at Promise._settlePromise0 (/usr/local/learninglocker/releases/ll-20190531-8490de5f362eb1ca7368fab00f75c81fecfd680f_1/webapp/node_modules/bluebird/js/release/promise.js:614:10)\n
  at Promise._settlePromises (/usr/local/learninglocker/releases/ll-20190531-8490de5f362eb1ca7368fab00f75c81fecfd680f_1/webapp/node_modules/bluebird/js/release/promise.js:690:18)\n
  at _drainQueueStep (/usr/local/learninglocker/releases/ll-20190531-8490de5f362eb1ca7368fab00f75c81fecfd680f_1/webapp/node_modules/bluebird/js/release/async.js:138:12)\n
  at _drainQueue (/usr/local/learninglocker/releases/ll-20190531-8490de5f362eb1ca7368fab00f75c81fecfd680f_1/webapp/node_modules/bluebird/js/release/async.js:131:9)\n
  at Async._drainQueues (/usr/local/learninglocker/releases/ll-20190531-8490de5f362eb1ca7368fab00f75c81fecfd680f_1/webapp/node_modules/bluebird/js/release/async.js:147:5)\n
  at Immediate.Async.drainQueues (/usr/local/learninglocker/releases/ll-20190531-8490de5f362eb1ca7368fab00f75c81fecfd680f_1/webapp/node_modules/bluebird/js/release/async.js:17:14)\n
  at runCallback (timers.js:810:20)\n
  at tryOnImmediate (timers.js:768:5)\n
  at processImmediate [as _immediateCallback] (timers.js:745:5)'[39m }
[ioredis] Unhandled error event: ReplyError: ERR max number of clients reached
    at JavascriptRedisParser.returnError (/usr/local/learninglocker/releases/ll-20190531-8490de5f362eb1ca7368fab00f75c81fecfd680f_1/webapp/node_modules/ioredis/lib/redis/parser.js:24:25)
    at JavascriptRedisParser.execute (/usr/local/learninglocker/releases/ll-20190531-8490de5f362eb1ca7368fab00f75c81fecfd680f_1/webapp/node_modules/redis-parser/lib/parser.js:572:12)
    at Socket.<anonymous> (/usr/local/learninglocker/releases/ll-20190531-8490de5f362eb1ca7368fab00f75c81fecfd680f_1/webapp/node_modules/ioredis/lib/redis/event_handler.js:107:22)
    at emitOne (events.js:116:13)
    at Socket.emit (events.js:211:7)
    at addChunk (_stream_readable.js:263:12)
    at readableAddChunk (_stream_readable.js:250:11)
    at Socket.Readable.push (_stream_readable.js:208:10)
    at TCP.onread (net.js:601:20)
[ioredis] Unhandled error event: ReplyError: ERR max number of clients reached
    at JavascriptRedisParser.returnError (/usr/local/learninglocker/releases/ll-20190531-8490de5f362eb1ca7368fab00f75c81fecfd680f_1/webapp/node_modules/ioredis/lib/redis/parser.js:24:25)
    at JavascriptRedisParser.execute (/usr/local/learninglocker/releases/ll-20190531-8490de5f362eb1ca7368fab00f75c81fecfd680f_1/webapp/node_modules/redis-parser/lib/parser.js:572:12)
    at Socket.<anonymous> (/usr/local/learninglocker/releases/ll-20190531-8490de5f362eb1ca7368fab00f75c81fecfd680f_1/webapp/node_modules/ioredis/lib/redis/event_handler.js:107:22)
    at emitOne (events.js:116:13)
    at Socket.emit (events.js:211:7)
    at addChunk (_stream_readable.js:263:12)
    at readableAddChunk (_stream_readable.js:250:11)
    at Socket.Readable.push (_stream_readable.js:208:10)
    at TCP.onread (net.js:601:20)
2019-05-31 01:39:18:129 - [31merror[39m: 0a90c7b1-454e-4ea6-8205-2a0e2dfa4c8c

I have tried to upgrade the the installation to resolve the issues, but that did not help either. Restarting the services has not helped.

What is the expected behavior?

The visualisations should eventually show within the dashboard.

Is there any additional information that will help us replicate/understand the problem?

caperneoignis commented 5 years ago

how big are these visualizations? Because Redis is running out of client connections for you to connect to. It sounds like your server is running out of memory.

Is Redis running on the same server as learninglocker? or is it separate? When you say you upgraded the instances, do you mean the server learninglocker is running on? Mongo atlas? Both?

Redis running out of connections is normally a sign you need to configure redis from default settings, and possibly move it to it's own service, read: elastic cache, or standalone server.

it sounds like the visualization is running against data that is bigger then a couple of records. The AMI is really meant for demo purposes, and not for running against bigger record sets (page count). I'd load the AMI, update it using the deploy script, then add a Redis server, and mongo server, have learninglocker connect to those. I'd also use EC2.medium if you are running visualizations against more then a couple of records.

Also, I'd update the url for the worker, "https://example.com" makes it kind of hard to read, I'd use the IP of the server, just in case anything is actually trying to verify the URL.

caperneoignis commented 5 years ago

Also, started at of undefined, looks like it's an error from the front end, client side, maybe the timestamp is coming in wrong, null, or unreadable. I'd have to let Ryan or one of them look at that one.

shauncodal commented 5 years ago

Having the same issue after an upgrade.

500 error on

GET https://xapi.ramped4.com/api/statements/aggregateAsync?pipeline=%5B%7B%22%24match%22%3A%7B%22timestamp%22%3A%7B%22%24gte%22%3A%7B%22%24dte%22%3A%222019-06-20T05%3A35%3A00.000Z%22%7D%7D%7D%7D%2C%7B%22%24match%22%3A%7B%22%24and%22%3A%5B%7B%7D%2C%7B%7D%5D%7D%7D%2C%7B%22%24match%22%3A%7B%22person._id%22%3A%7B%22%24exists%22%3Atrue%7D%7D%7D%2C%7B%22%24project%22%3A%7B%22group%22%3A%22%24person._id%22%2C%22model%22%3A%22%24person.display%22%7D%7D%2C%7B%22%24group%22%3A%7B%22_id%22%3A%22%24group%22%2C%22count%22%3A%7B%22%24sum%22%3A1%7D%2C%22group%22%3A%7B%22%24first%22%3A%22%24group%22%7D%2C%22model%22%3A%7B%22%24first%22%3A%22%24model%22%7D%7D%7D%2C%7B%22%24sort%22%3A%7B%22count%22%3A-1%7D%7D%2C%7B%22%24limit%22%3A10000%7D%2C%7B%22%24project%22%3A%7B%22_id%22%3A1%2C%22count%22%3A1%2C%22model%22%3A1%7D%7D%5D&skip=&sinceAt= 500 (Internal Server Error)

DellaStapleton commented 5 years ago

in addition to the above query by my colleague shauncodal, there is now a new issue which is more serious. We can no longer sign into our learning locker account. Takes ages and eventually get an error. Also all xapi calls from our LMS are taking ages and we eventually get a curl error : "cURL Error #:Operation timed out after 30000 milliseconds with 0 bytes received"

frmbelz commented 4 years ago

I had almost identical error in UI on a new Open Source manual install. Visualize graphs would spin loading then erroring out with

Unhandled rejection TypeError: Cannot read property 'startedAt' of undefined...

api_stderr-1.log had [ioredis] Unhandled error event: Error: connect ECONNREFUSED 127.0.0.1:6379 at TCPConnectWrap.afterConnect [as oncomplete] (net.js:1191:14)

In my case I did not install and start redis at all. Starting redis solved it. To reproduce stop redis and try loading the visualizations.

Now I have a larger LL with many visualisations which load sometimes and would not load the other times with the same UI startedAt error. Will check redis.

LearningLocker / learninglocker