ShaneK / Matador

Front-end web interface for Bull Job Manager
MIT License
98 stars 51 forks source link

Locked up? Too many clients? #31

Closed GeoffreyPlitt closed 7 years ago

GeoffreyPlitt commented 8 years ago

I love the Matador UI. But it's slow for me.

It worked great when I was prototyping. But now in production, I have 6 queues, each with a worker, running in Redis, handling about 2-3 jobs per second each.

The workers run fine, but once they get going, maybe 15-30 mniutes later, the Matador UI gets stuck. The express path just times out at 30s. Refresh the browser and rebooting the workers doesn't seem to fix it.

It seems to get stuck after several hundred thousand jobs have completed, or when a lot of jobs are running at once. During this time, the jobs keep working hunky-dory. It's just the Madator UI

Once stuck, even if I stop all the workers, still stuck. Only clearing redis completely seems to unstick it.

Could there be some sort of redis locking happening? Or are there scaling issues with Bull or something?

Maybe Matador is trying to load hundreds and hundreds of job detaisl under the hood, eagerly? I would figure it should only lazy-load details once I drill down, and just get counts at first load.

GeoffreyPlitt commented 8 years ago

More: I think there's a definite scaling bug, where memory is consumed linearly with the number of jobs. I'm getting out-of-memory errors that are definitely tied to the Bull/Matador route.

GeoffreyPlitt commented 8 years ago

@ShaneK Please help!

ShaneK commented 8 years ago

Hm, I haven't actually had any excuse to use Matador in a long time, but it doesn't surprise me that you might encounter issues with the web facing side if there are too many jobs. Knockout definitely has issues with very large lists (or at least the version of knockout I used when I wrote Matador). Is the back-end timing out if you hit the API directly? I would hope not, but if it is then I may need to look into making that part more efficient...

GeoffreyPlitt commented 8 years ago

@ShaneK Right so my question is, if I'm on the main Matador screen that just shows counts, these should be efficient API calls that only return counts, right? If my Bull system has hundreds of thousands of jobs, I don't expect Matador to load all the jobs. If it's loading all jobs, that would explain the slowness.

ShaneK commented 8 years ago

https://github.com/ShaneK/Matador/blob/master/models/redis.js

It appears we currently rely mostly on the keys command, which I've learned since isn't good in production environments, because it is blocking and will lock up resources if it's used frequently and/or on large sets of data.

This definitely needs to be updated to use scans instead.

GeoffreyPlitt commented 8 years ago

Gotcha.

I'm running a production environment right now, and this is blocking us. What should I do? Will this be fixed soon, or should I switch to a different UI altogether? I love this UI best among them, but I can't use something that breaks when I hit 100k jobs.

ShaneK commented 8 years ago

Unfortunately, I don't have much time to dedicate to programming during the week, and since my current work place does not use Bull, I can't justify working on it during work hours. I'll try to get it sorted out this weekend, but until this is fixed I definitely recommend not running Matador on your production environment, sadly :disappointed:

jeffreywescott commented 7 years ago

We are also seeing this. NewRelic says it's SMEMBERS and KEYS:

transactions_-_game_-_new_relic

FWIW, our completed queue has about 12k+ items in it.

GeoffreyPlitt commented 7 years ago

Hello,

I had this issue 6 mos ago, left Bull to use a different queueing system, and we're now coming back to Bull. Bull works great. But Matador still locks up for us after pumping a lot (100k) of items through queues.

Is anybody available to help troubleshoot this, or should we just go use something else?

jeffreywescott commented 7 years ago

We couldn't deal with Matador's instability, so we switched to Toureiro and haven't looked back.

ShaneK commented 7 years ago

Unfortunately I don't have the time or will to support this project anymore, so it sounds like Toureiro might be the way to go. I'm really sorry guys.

GeoffreyPlitt commented 7 years ago

Gotcha, thanks guys!