Closed tcurley1 closed 2 years ago
What could cause mnesia:table_info/2
to return undefined
? At-least locally on OTP 24 it returns 0
even if mnesia is down or the table does not exist. We're seeing the same problem in a clustered environment with some load. Timeouts?
Same problem encountered, any updates?
I'm seeing the same issue when upgrading RabbitMQ from v3.8.3
to v3.8.27
(v3.8.23
apparently bumped the client from 4.6.0
to 4.8.1
). This is on a healthy 3 node cluster.
Are there any known workarounds except for downgrading?
I'm seeing the same issue when upgrading RabbitMQ from
v3.8.3
tov3.8.27
(v3.8.23
apparently bumped the client from4.6.0
to4.8.1
). This is on a healthy 3 node cluster.Are there any known workarounds except for downgrading?
have you tried restarting the rabbitmq services? looks like it`s a fix, I tested on one box
Thanks, I tried to restart one of three nodes, but that didn't solve it – I see the same error after a clean start. I've also tried disabling and then enabling the rabbitmq_prometheus
plugin but that had no effect either.
I have another RabbitMQ cluster with just a single node also running v3.8.27
and there the Prometheus exporter works fine with no errors in the logs.
Thanks, I tried to restart one of three nodes, but that didn't solve it – I see the same error after a clean start. I've also tried disabling and then enabling the
rabbitmq_prometheus
plugin but that had no effect either.I have another RabbitMQ cluster with just a single node also running
v3.8.27
and there the Prometheus exporter works fine with no errors in the logs.
My RabbitMQ is v3.9.5, 3 out of 4 servers have this problem, and I tried restart one of the service and the problem gone. I do tried other method than using Prometheus. Like using rabbitmq_cloudwatch_exporter, it requires service restart and can do the job. Used rabbitmq_exporter, too, and it crushed at certain time.
Thanks, I tried to restart one of three nodes, but that didn't solve it – I see the same error after a clean start. I've also tried disabling and then enabling the
rabbitmq_prometheus
plugin but that had no effect either.I have another RabbitMQ cluster with just a single node also running
v3.8.27
and there the Prometheus exporter works fine with no errors in the logs.
Since the problem is from get_memory_usage()
this function, I tried to disable it but since lack of knowledge of erlang, I haven`t figured out how.
Hello, does anyone have a fix for this ? We did restart the broker one by one on our cluster, and it is still loggin a huge amount of log because of that.... it is really polluting the logs.
Is there a know procedure to execute to fix it ? Thx
Can someone please open a ticket against RabbitMQ instead? https://github.com/rabbitmq/rabbitmq-server
actually there is a thread discussing the issue here : https://groups.google.com/g/rabbitmq-users/c/JXF8ryEqvyk/m/4L4xiklXAwAJ
and they suggest to point back to this github as it is related to prometheus collector....
we are circling around with no solution so far.
Fine, let me see what I can do tomorrow. cc @lhoguin
I have opened https://github.com/erlang/otp/issues/5830 to figure out if that's normal. If it is the fix is very simple.
If you need more info to know how to reproduce, lets us know. Currently, my only way to fix it, is to completly stop my RabbitMQ cluster, and only then start.
I have opened https://github.com/deadtrickster/prometheus.erl/pull/140 with a fix.
released as 4.8.2
thanks for all the action there... My next question is : how can I test that new code/release with RabbitMQ code ? any pointer ? thx
Hello! I have opened https://github.com/rabbitmq/rabbitmq-server/pull/4376 that upgrades Prometheus.erl to 4.8.2.
How to test depends on how you run RabbitMQ. You can always build RabbitMQ from this branch directly in any case.
Building the plugin on Erlang 23 or 24, replacing it in the plugins
directory and restarting the node on Erlang 24 would do. No need to build the entire distribution ;)
Gotcha! Thanks.
If you need a Docker image you can find them at https://hub.docker.com/r/pivotalrabbitmq/rabbitmq/tags?page=1&name=loic-update-prometheus but it will run the most recent RabbitMQ code.
Here is a build of this plugin with a fix: prometheus-4.8.2.ez.zip. It's a .zip only because GitHub does not allow for .ez files. Uncompress it and replace the prometheus .ez you will find in the plugins directory of your installation.
Then run any tests you need to run.
If we do not get any feedback on this by next week, we will proceed with merging rabbitmq/rabbitmq-server#4376 and consider this done.
This was addressed in https://github.com/rabbitmq/rabbitmq-server/pull/4376, scheduled to ship in RabbitMQ 3.10.0
and 3.9.15
.
@tcurley1 @deadtrickster should we close this issue?
Thanks everyone!
When upgrading RabbitMQ from 3.8.x -> 3.9.x, the prometheus client version was bumped from
4.6.0
to4.8.1
which introduced a bug collecting and reporting metrics in theprometheus_mnesia
collector.Using rabbitmq diagnostics checks I was able to verify that mnesia does have a memory usage value associated with it, but the prometheus client is throwing the following error: