akoutmos / prom_ex

An Elixir Prometheus metrics collection library built on top of Telemetry with accompanying Grafana dashboards
MIT License
594 stars 104 forks source link

[BUG] ecto plugin prefix doesn't match default dashboard in umbrella application #31

Open doughsay opened 3 years ago

doughsay commented 3 years ago

Describe the bug I'm in an umbrella application with several sub-applications each with their own ecto repo, so I'm trying to start the prom_ex ecto plugin for one of them, like this:

{Plugins.Ecto, otp_app: :core}

This generates prometheus metrics like this:

core_prom_ex_ecto_repo_query_execution_time_milliseconds_sum{command="select",repo="Core.Repo",source="core_tickets"} 17.019069

But the provided default ecto grafana dashboard looks like it's using metric names with the prefect_ prefix, e.g.:

sum(irate(prefect_prom_ex_ecto_repo_query_execution_time_milliseconds_sum{job=\"$job\", instance=\"$instance\", repo=\"$repo\"}[$interval])) by(command) / sum(irate(prefect_prom_ex_ecto_repo_query_execution_time_milliseconds_count{job=\"$job\", instance=\"$instance\", repo=\"$repo\"}[$interval])) by(command)
prefect_prom_ex_ecto_repo_query_execution_time_milliseconds_sum
!=
core_prom_ex_ecto_repo_query_execution_time_milliseconds_sum

prefect is the name of my monitoring application, core is the name of another app in the same umbrella.

I want to eventually be able to start multiple of these plugins:

{Plugins.Ecto, otp_app: :core},
{Plugins.Ecto, otp_app: :ops},
{Plugins.Ecto, otp_app: :commerce}

It looks like the grafana dashboard can handle multiple repos, it just doesn't expect them to all be handled by different otp_apps. The dashboard being uploaded is making the implicit assumption that the app running prom_ex is the same app that runs the ecto repo. What can we do about this? Would love to get this working! 😄

To Reproduce Steps to reproduce the behavior:

  1. Create an umbrella application with two apps, e.g.: my_monitor + my_app.
  2. Install ecto into my_app and prom_ex into my_monitor
  3. Attempt to use the prom_ex ecto plugin to monitor the ecto repo in my_app.

Expected behavior I'm not sure what to expect yet, I think I need to think about it more...

Environment

Additional context A lot of what I've tried so far is working great. The BEAM plugin + dashboard is working, the phoenix plugin + dashboard is working (even though my phoenix app is in yet another otp_app in my umbrella). It's just the ecto plugin making this bad assumption. I have not yet tried oban, but I'm worried about that one too, we have two separate oban installations inside our umbrella as well.

akoutmos commented 3 years ago

Hey Chris!

There are actually some undocumented features in PromEx that allow you to do this. I need to tidy up that API prior to exposing that functionality though. I should be able to wrap this up by middle-end of next week :).

akoutmos commented 3 years ago

If it is not too much trouble @doughsay, would you mind taking branch supporting_multiple_plugin_definitions for a test drive (PR https://github.com/akoutmos/prom_ex/pull/32)? I don't have a sample umbrella application with multiples repos that i can test against on hand and could use the additional validation.

Your configuration should look something like so:

@impl true
def plugins do
  [
    {Plugins.Ecto, otp_app: :core},
    {Plugins.Ecto, otp_app: :ops},
    {Plugins.Ecto, otp_app: :commerce}
  ]
end

@impl true
def dashboards do
  [
    {:prom_ex, "ecto.json", otp_app: :core, title: "Core - PromEx Ecto Dashboard"},
    {:prom_ex, "ecto.json", otp_app: :ops, title: "Ops - PromEx Ecto Dashboard"},
    {:prom_ex, "ecto.json", otp_app: :commerce, title: "Commerce - PromEx Ecto Dashboard"}
  ]
end
doughsay commented 3 years ago

This looks great! I will try and get to it today and let you know, thanks!

akoutmos commented 3 years ago

Thanks @doughsay! Appreciate the speedy testing :D

doughsay commented 3 years ago

Hey! I managed to try this out tonight, and here are my findings:

The grafana dashboard isn't / wasn't working because it isn't using the right queries to find the variables:

DeepinScreenshot_select-area_20210301183346

After that it "worked":

DeepinScreenshot_select-area_20210301181434

...but as you can see, all the repos are being reported for each plugin instance. Not sure why this is, because each of my applications is configured to only use one repo each:

iex(noreaga-local@127.0.0.1)2> Application.get_env(:core, :ecto_repos)
[Core.Repo]
iex(noreaga-local@127.0.0.1)3> Application.get_env(:ops, :ecto_repos)
[Ops.Repo]

I can go either way on having multiple dashboards vs a single dashboard with a repo drop-down, it doesn't matter to me which way it works. But my guess is that the intent is that for a given prom_ex ecto plugin, it should correspond to one grafana dashboard, and IF that one plugin collects events for multiple repos (i.e. because the otp_app is actually configured with multiple repos) then the drop-down comes into play. In my case, this isn't the case, I have a one-to-one mapping of apps to repos.

Anyway... if you think it would help, I was thinking of putting together an example umbrella app with a similar setup to how we have it so we can work on getting all of prom_ex working using that sample instead of going back and forth on my actual application from work... Would that help?

akoutmos commented 3 years ago

Sorry that that didn't work out as I expected :/. If you wouldn't mind putting together a sample umbrella app for me to test against, that would probably be the best way to root cause and fix this one.

akoutmos commented 3 years ago

Hey @doughsay! Any luck on getting a sample app put together for this?

doughsay commented 3 years ago

It just so happens I'm working on it today! It's not quite done yet, but I'll let you know as soon as it is 😁

akoutmos commented 3 years ago

Awesome! Appreciate you putting that together :).

doughsay commented 3 years ago

Hey @akoutmos,

I've run out of time for today, but I have something that at least seems to expose a few problems so far: https://github.com/doughsay/prom_ex_umbrella_example

It's a bit rough around the edges, but I hope you can get it running. I hacked in a quick ecto query by visiting the index page of the phoenix app "WebOne", just to get ecto doing something basic.

Things of note:

I hope to clarify things a bit more when I have more time, but I hope this at least gets us started.

akoutmos commented 3 years ago

Awesome work! I'll take a look at the repro project and get back to you with some ideas. Thanks :)

doughsay commented 3 years ago

Just FYI, I realized my instructions for getting things running are missing a step: you have to run the ecto migrations and db creation using mix, but the config files are set up to only work while running inside docker. The quickest workaround is:

...or I guess maybe you can docker exec ... into the running container and run the mix tasks in there? I dunno, I hope you can figure it out! 😅

akoutmos commented 3 years ago

I should be able to get to this early next week. Started digging in and I think I have a plan as to how to move forward.

akoutmos commented 3 years ago

Just bumping this one so you don't think that I forgot about it :P. I've played around with a couple different ideas as to how this can cleanly be supported. Shooting to have something reviewable by next week!

doughsay commented 3 years ago

@akoutmos thanks for the update! I myself have been super busy lately and was not able to get back to rounding out that example app more past just the bare bones I sent you... Is what's there good enough to fully demonstrate the problem? Or could you still benefit from me adding more examples to the example app?

Let me know and I'll see if I can many get some more work done on it soon!

akoutmos commented 3 years ago

Appreciate it! The example you provided is more than sufficient. I'm working on also integrating that into the example applications directory as that will be a good E2E test app.