Closed jdreaver closed 4 years ago
Yep, this is a known issue. Since queue names don't have a well-known prefix in Redis, I can't scan for them on startup. As jobs push, schedule or retry, Faktory will learn the current queue set and since Faktory is designed to run 24/7, the thought was that this period of ignorance should be infrequent and short.
How often are you restarting Faktory?
We restart once per week to ensure we are on the latest Amazon Linux version. We could definitely update Amazon Linux less often, but we err on the side of more frequent updates so we can diagnose any potential issues quicker.
Some of the jobs queues in questions are populated once per night or once per week. We noticed this issue because our weekly job is running a bit slow, and the restart at the beginning of the week caused the queue to disappear from the UI.
The quick and easy hack workaround is to add your queues to the Statsd latency list:
https://github.com/contribsys/faktory/wiki/Pro-Metrics#latency
Faktory will know about it so it should always appear in the Web UI. If the queues are mostly empty, you won't see much overhead in checking latency.
Sounds good!
Can I make a feature request here then? :smile: Would it be possible to add a way to automatically track latency for all jobs? All of our queues are empty or close to empty most of the time, so I don't think it will be a big performance hit for us.
That's possible although it treads the line of "not a best practice" because latency checks are relatively expensive and if you have 100s of queues, the overhead can become significant. In other words, it starts to look like a footgun at scale. That's what I'm paid to worry about...
That makes sense. We can try to DRY our list of queue names in our automation for the time being, or just whitelist some queues we really care about.
I found another workaround: I went to the Busy tab, found one instance of our job executing, and clicked on the job name. This took me to https://faktory.freckle.com/queues/<job-name>
. Once I went back to https://faktory.freckle.com/queues, the queue was backed and showed all the enqueued jobs.
Ah yes, constructing the queue URL is perfect. No overhead at all, you can
script it and use curl
right after restart.
On Wed, Apr 22, 2020 at 9:26 AM David Reaver notifications@github.com wrote:
I found another workaround: I went to the Busy tab, found one instance of our job executing, and clicked on the job name. This took me to https://faktory.freckle.com/queues/
. Once I went back to https://faktory.freckle.com/queues, the queue was backed and showed all the enqueued jobs. — You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/contribsys/faktory/issues/298#issuecomment-617885441, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAAAWX7SBNKVWPFXQWGW4BTRN4LEFANCNFSM4MNM35CA .
Hello!
We run Faktory Pro on ECS via AWS EC2. We have an EFS file system that holds all of the Faktory file system state. We periodically upgrade the underlying EC2 instance, which of course requires us restarting Faktory.
We noticed that when Faktory restarts, the queue list UI at https://faktory.freckle.com/queues gets cleared. It shows no queues, and we only see the queues after they get more jobs enqueued to them. It appears that the queues are not actually cleared under the hood, because our job consumers are still running and accepting jobs (we see them running DB queries, doing work, and reporting progress). However, we can't see the queues in the UI. Also, the statsd metrics for the missing queues are no longer emitted.
Is this known behavior? Should the UI always reflect the latest state of all queues, even after a restart?
We are on the latest Faktory Pro version 1.4.0.