Closed GeoffreyPlitt closed 7 years ago
More: I think there's a definite scaling bug, where memory is consumed linearly with the number of jobs. I'm getting out-of-memory errors that are definitely tied to the Bull/Matador route.
@ShaneK Please help!
Hm, I haven't actually had any excuse to use Matador in a long time, but it doesn't surprise me that you might encounter issues with the web facing side if there are too many jobs. Knockout definitely has issues with very large lists (or at least the version of knockout I used when I wrote Matador). Is the back-end timing out if you hit the API directly? I would hope not, but if it is then I may need to look into making that part more efficient...
@ShaneK Right so my question is, if I'm on the main Matador screen that just shows counts, these should be efficient API calls that only return counts, right? If my Bull system has hundreds of thousands of jobs, I don't expect Matador to load all the jobs. If it's loading all jobs, that would explain the slowness.
https://github.com/ShaneK/Matador/blob/master/models/redis.js
It appears we currently rely mostly on the keys
command, which I've learned since isn't good in production environments, because it is blocking and will lock up resources if it's used frequently and/or on large sets of data.
This definitely needs to be updated to use scans instead.
Gotcha.
I'm running a production environment right now, and this is blocking us. What should I do? Will this be fixed soon, or should I switch to a different UI altogether? I love this UI best among them, but I can't use something that breaks when I hit 100k jobs.
Unfortunately, I don't have much time to dedicate to programming during the week, and since my current work place does not use Bull, I can't justify working on it during work hours. I'll try to get it sorted out this weekend, but until this is fixed I definitely recommend not running Matador on your production environment, sadly :disappointed:
We are also seeing this. NewRelic says it's SMEMBERS
and KEYS
:
FWIW, our completed queue has about 12k+ items in it.
Hello,
I had this issue 6 mos ago, left Bull to use a different queueing system, and we're now coming back to Bull. Bull works great. But Matador still locks up for us after pumping a lot (100k) of items through queues.
Is anybody available to help troubleshoot this, or should we just go use something else?
We couldn't deal with Matador's instability, so we switched to Toureiro and haven't looked back.
Unfortunately I don't have the time or will to support this project anymore, so it sounds like Toureiro might be the way to go. I'm really sorry guys.
Gotcha, thanks guys!
I love the Matador UI. But it's slow for me.
It worked great when I was prototyping. But now in production, I have 6 queues, each with a worker, running in Redis, handling about 2-3 jobs per second each.
The workers run fine, but once they get going, maybe 15-30 mniutes later, the Matador UI gets stuck. The express path just times out at 30s. Refresh the browser and rebooting the workers doesn't seem to fix it.
It seems to get stuck after several hundred thousand jobs have completed, or when a lot of jobs are running at once. During this time, the jobs keep working hunky-dory. It's just the Madator UI
Once stuck, even if I stop all the workers, still stuck. Only clearing redis completely seems to unstick it.
Could there be some sort of redis locking happening? Or are there scaling issues with Bull or something?
Maybe Matador is trying to load hundreds and hundreds of job detaisl under the hood, eagerly? I would figure it should only lazy-load details once I drill down, and just get counts at first load.