Open jtomaszewski opened 6 years ago
Yeah 🤔 . Each VerkWeb
exposes data about the co-located running jobs. We don't store all running jobs inside Redis to share across all VerkWeb
instances because it can be a substantial amount of data. And because they can be short-lived jobs it would be just an unnecessary overhead. At least this was the reasoning behind.
I'm not sure what's the best approach in this case. Maybe we could keep track of "long-running jobs" somehow and every VerkWeb
instance would be able to store and visualise them?
It's also possible (not ideal IMHO) for the user to set up a GenStage consumer that reads data about every job and keeps this info somewhere (your own DB). More here: https://github.com/edgurgel/verk#error-tracking
I see. Seems like it's not that easy and maybe not even required (most of jobs are short-lived IMO).
Maybe we could mention it in the VerkWeb interface at least, so we avoid confusions. Like add a small warning alert above the "Running" list, that it is the list only for the current instance. WDYT? I can do a PR if u want
Yeah that would be great! 👍
Perhaps a list of nodes active in the cluster would be more helpful, including the ability to bounce around inspecting each node.
For example, I may have between 2 to 40 (or more) background processors running across a cluster (auto-scaling).
Being able to use a single verk_web instance to enumerate them and pop in to see what any given node is chewing on would be really cool.
(apologies for the near-necro on this issue, btw)
Maybe some aggregated stats could be saved inside Redis for all nodes in the cluster, like processed, failed, retries, dead counts. Data would be aggregated every X seconds where X could be configured. The main dashboard page with the graph could then show combined data for all nodes in the cluster with a pull-down menu where you can select individual nodes and see the data only for that node. If you run lots of servers and one of them starts to misbehave it would be very useful to have stats at the node level. Aggregated stats could have a TTL set to them and after a while, they will disappear if a server no longer runs and update them.
Let's say I have 2 nodes,
node_id: "1"
andnode_id: "2"
running on the same redis server.Currently, the "Failed", "Enqueued", "Scheduled" jobs properly list all jobs in the cluster (that is, jobs saved in Redis, no matter on what node is the VerkWeb being run).
Unfortunately, the "Running jobs" count and list lists only jobs running in the given node.
It would be cool if it would list all the jobs that are
inprogress
in all currently living nodes.Possibly related to https://github.com/edgurgel/verk/issues/157 ?
P.S. Until we implement it, we could mention it in the README and/or in VerkWeb interface.