Closed rocktavious closed 5 months ago
I’ve never seen this and there’s no info which gives me any more info. Can you start Faktory manually, outside of k8s, on the same machine?
Because this is running on a kubernetes cluster and we've locked down access to the nodes directly this is not something I can easily test. Additionally it has been deleted and created a number of times which causes it to schedule to 1 of the 7 nodes in this cluster and it hasn't worked on any of them.
The only difference i see on the node pools between the cluster it does work in and this one is kernal and amazon linux version.
Cluster it works in
OS-IMAGE KERNEL-VERSION CONTAINER-RUNTIME
Amazon Linux 2 5.10.173-154.642.amzn2.x86_64 containerd://1.6.19
Cluser it doesn't work in
OS-IMAGE KERNEL-VERSION CONTAINER-RUNTIME
Amazon Linux 2 5.10.215-203.850.amzn2.x86_64 containerd://1.7.11
I'll keep double checking configuration between the two but is it possible the same license file cannot be used for -e staging
and -e production
?
I think we ordered 2 licenses but the e-mail i was sent from my procurement team only had 1 license in it. And i noticed the log line says 200 connections when i expected each to only be 100.
You can share a single license between two servers, giving each 100 connections. I believe the Licensing wiki page covers that.
And yes, the same license can be used in staging and production.
@mperham what do you mean by giving each 100 connections
? Are you saying if i use the same license key between two instances - even if the environment is set to staging i have to also set FAKTORY_MAX_CONNS=100
for both the production and staging servers?
https://github.com/contribsys/faktory/wiki/Licensing#staging
"Faktory has the notion of environment -- you start Faktory in development, staging or production and only pay for production servers."
You have a total of 200 connections in production. You can have two production servers sharing the same license, each with FAKTORY_CONN_MAX=100. You can have unlimited staging servers but each staging server is limited to 100 connections automatically when you use -e staging
.
@mperham So then something is broken with the startup for faktory. Here is the command i'm using
faktory -b :7419 -w :7420 -l debug -e staging
and i get these logs
opslevel-faktory-0 server Faktory Enterprise 1.9.0 linux/amd64
opslevel-faktory-0 server © 2024 Contributed Systems LLC.
opslevel-faktory-0 server D 2024-06-17T14:21:43.540Z Options: {:7419 :7420 staging /etc/faktory debug /var/lib/faktory/db}
opslevel-faktory-0 server W 2024-06-17T14:21:43.540Z Invalid licensing, please see the Faktory wiki for proper configuration
opslevel-faktory-0 server E 2024-06-17T14:21:43.540Z : No valid licensing found in FAKTORY_LICENSE or /etc/faktory/license:
and then the pod shuts down.
I've also tried giving it the FAKTORY_LICENSE
which was sent to us and thats when you get the above logs from the initial ticket.
Sounds like you have a good clue. I have to assume there's some annoyance in your licensing configuration preventing startup, like unwanted whitespace or something.
@mperham I FOUND IT. It totally is the statsd settings.
I switched it back to using the FAKTORY_LICENSE
and when using -e staging
it stands up as long as i remove the statsd.toml
because its pointed to a datadog agent that doesn't exist. So it seems like the statsd initialization is crashing the faktory server if it cannot reach out to the statsd server???
That still doesn't make sense to me, Datadog should be more resilient than that.The Statsd protocol uses UDP which is connection-less. The existence of the remote side should be irrelevant.
@mperham - well i've confirmed it. Adding back the statsd.toml
causes it to shutdown.
Here is my config
[statsd]
location = "datadog-agent.datadog.svc.cluster.local:8125"
namespace = "faktory"
tags = ["env:dev"]
queueLatency = ["app", "default", "runner"]
So it seems a few things:
-e staging
you still need to provide a valid FAKTORY_LICENSE
(which i don't think is expected)location
that isn't present or doesn't resolve the faktory application exits with exitcode 0 without any logs to what the problem is.I've replicated this both on 1.8.0 and 1.9.0 using the enterprise image docker.contribsys.com/contribsys/faktory-ent
Yep, I can reproduce this. Fix coming...
I've found and fixed the issue with Faktory stopping with no error message.
I 2024-06-17T16:15:07.010Z Sending statsd metrics to mike:8150 with namespace faktory
I 2024-06-17T16:15:07.010Z Web server now listening at localhost:7420
E 2024-06-17T16:15:07.012Z Unable to start Faktory: cannot start server subsystem statsd at mike:8150: lookup mike: no such host
D 2024-06-17T16:15:07.012Z Stopping scheduled tasks
D 2024-06-17T16:15:07.113Z Stopping storage
D 2024-06-17T16:15:07.113Z Shutting down Redis PID 5739
I suspect your underlying issue is that the datadog statsd host is not DNS resolvable. Datadog resolves the IP address when it creates the Statsd connection.
Thanks @mperham - I assume this will be in 1.10 ?
Also what about the -e staging
still requiring the FAKTORY_LICENSE
env var to be filled? Is that expected?
It will be in 1.9.1 or 1.10.
A license is always required outside of the development environment.
Are you using an old version? Yes by 1 release Have you checked the changelogs to see if your issue has been fixed in a later version? Yes
When starting up the faktory server we get a constant crashlooping in kubernetes and the logs don't indicate why when bumpped to a debug level.
This is for our enterprise license faktory we are trying to stand up.
Here are the logs it prints out.
Additionally kubernetes reports the process as exiting fully and cleanly and with exit code 0