Running jobs should list all the jobs being run in the cluster

edgurgel / verk_web

A dashboard for the job processing system that just verks! 🧛‍

https://hex.pm/packages/verk_web

MIT License

81 stars 30 forks source link

Running jobs should list all the jobs being run in the cluster #69

Open jtomaszewski opened 6 years ago

jtomaszewski commented 6 years ago

Let's say I have 2 nodes, node_id: "1" and node_id: "2" running on the same redis server.

Currently, the "Failed", "Enqueued", "Scheduled" jobs properly list all jobs in the cluster (that is, jobs saved in Redis, no matter on what node is the VerkWeb being run).

Unfortunately, the "Running jobs" count and list lists only jobs running in the given node.

It would be cool if it would list all the jobs that are inprogress in all currently living nodes.

Possibly related to https://github.com/edgurgel/verk/issues/157 ?

P.S. Until we implement it, we could mention it in the README and/or in VerkWeb interface.

edgurgel commented 6 years ago

Yeah 🤔 . Each VerkWeb exposes data about the co-located running jobs. We don't store all running jobs inside Redis to share across all VerkWeb instances because it can be a substantial amount of data. And because they can be short-lived jobs it would be just an unnecessary overhead. At least this was the reasoning behind.

I'm not sure what's the best approach in this case. Maybe we could keep track of "long-running jobs" somehow and every VerkWeb instance would be able to store and visualise them?

It's also possible (not ideal IMHO) for the user to set up a GenStage consumer that reads data about every job and keeps this info somewhere (your own DB). More here: https://github.com/edgurgel/verk#error-tracking

jtomaszewski commented 6 years ago

I see. Seems like it's not that easy and maybe not even required (most of jobs are short-lived IMO).

Maybe we could mention it in the VerkWeb interface at least, so we avoid confusions. Like add a small warning alert above the "Running" list, that it is the list only for the current instance. WDYT? I can do a PR if u want

edgurgel commented 6 years ago

Yeah that would be great! 👍

suddenrushofsushi commented 5 years ago

Perhaps a list of nodes active in the cluster would be more helpful, including the ability to bounce around inspecting each node.

For example, I may have between 2 to 40 (or more) background processors running across a cluster (auto-scaling).

Being able to use a single verk_web instance to enumerate them and pop in to see what any given node is chewing on would be really cool.

(apologies for the near-necro on this issue, btw)

adrian-catana commented 5 years ago

Maybe some aggregated stats could be saved inside Redis for all nodes in the cluster, like processed, failed, retries, dead counts. Data would be aggregated every X seconds where X could be configured. The main dashboard page with the graph could then show combined data for all nodes in the cluster with a pull-down menu where you can select individual nodes and see the data only for that node. If you run lots of servers and one of them starts to misbehave it would be very useful to have stats at the node level. Aggregated stats could have a TTL set to them and after a while, they will disappear if a server no longer runs and update them.