appsignal / appsignal-elixir

🟪 AppSignal for Elixir package
https://www.appsignal.com/elixir
MIT License
282 stars 81 forks source link

No obvious way to instrument actions that cannot be wrapped in functions in 2.0 API #614

Closed lukaszsamson closed 3 years ago

lukaszsamson commented 3 years ago

Disclaimer: I know It's more a support question but I've already tried reaching out and didn't get a helpful answer.

I'm trying to port an existing application that uses custom integration with Absinthe library to the new 2.0 API but I'm unable to get it to work.

Problem description: We have a GraphQL endpoint that we are instrumenting

forward "/graphql", Absinthe.Plug, schema: MyApp.Web.Schema

The router is instrumented at the Phoenix.Endpoint via

use Appsignal.Phoenix
  plug MyApp.Web.Router

we also have custom Absinthe middleware that in 1.x API created an Appsignal.Translation and submitted it when the graphql request is served.

before request callback

transaction =
    Transaction.start(Transaction.generate_id(), :graphql)
    |> Transaction.set_action(action)
    |> Transaction.set_sample_data("resolution", %{
      "current_user_permissions" => resolution.context.current_user_permissions,
      "raw_query" => resolution.context.conn.params,
      "graphql_args" => "#{inspect(args)}"
    })

... and after request callback

transaction = resolution.context.appsignal_transaction

if transaction do
  Transaction.finish(transaction)
  :ok = Transaction.complete(transaction)
end

Transaction API is now deprecated/removed and in this case I cannot use Appsignal.instrument as the code executed between those callbacks cannot be wrapped in a function that spans from before to after callback. I tried to base my solution on what Appsignal.instrument does internally:

befor request callback

span =
  Tracer.create_span("graphql", Tracer.current_span())
  |> Span.set_name(action)
  |> Span.set_attribute("appsignal:category", action)
  |> Span.set_sample_data("resolution", %{
    "current_user_permissions" => resolution.context.current_user_permissions,
    "raw_query" => resolution.context.conn.params,
    "graphql_args" => "#{inspect(args)}"
  })

... and after request callback

span = resolution.context.appsignal_span

if span do
  Appsignal.Tracer.close_span(span)
end

But I can't get this to work. The actions are not reported and a custom "graphql" namespace in not created.

Other parts (background processes and normal phoenix http endpoint instrumentations) are working fine. Can you point me how to make this work? This guide https://docs.appsignal.com/elixir/integrations/absinthe.html is certainly outdated.

Edit: see https://github.com/appsignal/appsignal-elixir/issues/614#issuecomment-754128315 for working solution

hubertlepicki commented 3 years ago

Interesting, I am second dude in Poland having precisely the same issue as you at precisely the same time, it seems.

I am trying to instrument Absinthe, in a middleware as well. I already create one outer span in Plug, and you seem to do as well.

In addition to that, I am also executing some graphql queries from background jobs.

In neither of the cases I can seem to get it to work. The inner span is not visible, added sample data (even using "tags" or "custom_data" - the blessed keys here that you do not use and probably should instead of "appsignal:category" and "resolution") - none of these things are showing up in Appsignal.

I suspect this is because we have nested spans and these are either broken, or there's another way to create them so they work properly but I have no idea how to get it working at the moment.

hubertlepicki commented 3 years ago

Maybe as an explanation of what I am trying to do. I dumbed down my code so it's as simple as possible.

I have a subscription like this:

     subscription do
       field :my_field, :my_type do
         arg(:arg_id, non_null(:id))
         middleware(MyApp.Schema.PopulateContext)
         middleware(MyApp.Schema.StartSpan)                                                                                                             

         config(fn args, _ ->
           {:ok, topic: args.arg_id}
         end)

         resolve(&MyApp.Resolvers.SubscriptionsResolvers.resolve_my_field/3)

         middleware(MyApp.Schema.StopSpan)
       end    

And my middleware modules look like this:

defmodule MyApp.Schema.StartSpan do
  @behaviour Absinthe.Middleware
  @tracer Application.get_env(:appsignal, :appsignal_tracer, Appsignal.Tracer)
  @span Application.get_env(:appsignal, :appsignal_span, Appsignal.Span)k
  def call(resolution, _config) do
    span = @tracer.create_span("absinthe_subscription", @tracer.current_span)

    span =
      span
      |> @span.set_name("AbsintheSubscription/#{resolution.definition.name}")
      |> @span.set_sample_data("custom_data", metadata_from_resolution(resolution))

    %{ resolution | context: Map.put(resolution.context, :appsignal_span, span) }
  end                                                                                                                                                       

  defp metadata_from_resolution(resolution) do
   %{
      arguments: resolution.arguments,
      current_account_id: id_of_or_nil(resolution.context[:current_account]),
      current_role_id: id_of_or_nil(resolution.context[:current_role]),
      current_company_id: id_of_or_nil(resolution.context[:current_company])
    }
  end

  defp id_of_or_nil(nil), do: nil
  defp id_of_or_nil(thing), do: thing.id
end           

defmodule MyApp.Schema.StopSpan do
  @behaviour Absinthe.Middleware
  @tracer Application.get_env(:appsignal, :appsignal_tracer, Appsignal.Tracer)

 def call(resolution, _config) do
    span = resolution.context.appsignal_span
    @tracer.close_span(span)

    resolution
  end
end 

What I'd expect is that when these subscriptions execute, I will see namespace "absinthe_subscription", with sample with custom data I set and name I set - in "Performance" tab - since AppSignal doesn't throw exceptions.

Instead, I am seeing only the outer span, that I start independently somewhere higher in the app before subscription is triggered, in a Plug or otherwise, which has none of the metadata or fields I set manually here.

hubertlepicki commented 3 years ago

Update:

what does work for me @lukaszsamson is not creating nested spans. Instead, I just update custom_data / tags / name of outer span this way:

defmodule MyApp.Schema.AddMetadata do
  @behaviour Absinthe.Middleware
  require Logger

  @tracer Application.get_env(:appsignal, :appsignal_tracer, Appsignal.Tracer)
  @span Application.get_env(:appsignal, :appsignal_span, Appsignal.Span)

  def call(resolution, _config) do
    :ok = add_appsignal_metadata(resolution)

    resolution
  end

  defp add_appsignal_metadata(resolution) do
    span = @tracer.current_span || @tracer.create_span("background")

    span
      |> @span.set_name("AbsintheSubscription/#{resolution.definition.name}")
      |> @span.set_sample_data("custom_data", metadata_from_resolution(resolution))

    :ok
  end

  defp metadata_from_resolution(resolution) do
    %{
      arguments: resolution.arguments,
      current_account_id: id_of_or_nil(resolution.context[:current_account]),
      current_role_id: id_of_or_nil(resolution.context[:current_role]),
      current_company_id: id_of_or_nil(resolution.context[:current_company])
    }
  end

  defp id_of_or_nil(nil), do: nil
  defp id_of_or_nil(thing), do: thing.id
end
lukaszsamson commented 3 years ago

@hubertlepicki I tried your suggestion of removing the subscope and in my case it did no work. The difference is that I'm not using Absinthe subscriptions but only queries/mutations and I already have a web request Appsignal scope from Phoenix router instrumentation. In my case no data is reported from the graphql route at all, not only the subscopes and the applications seems to be slowly leaking memory (I'm not sure it's related to https://github.com/appsignal/appsignal-elixir/issues/606 though as I'm on 2.0.6).

lukaszsamson commented 3 years ago

OK I got this to work by changing

span =
  Tracer.create_span("graphql", Tracer.current_span())
  |> Span.set_name(action)

to

span =
  Tracer.create_span("graphql")
  |> Span.set_name(action)

in my original code.

I'm going to investigate the leaking memory in the nest days.

hubertlepicki commented 3 years ago

I think it would still be great if this was documented how to do it and esp. how (if at all) we should nest spans. My feeling is we are using private API that may change at any time.

pon., 4 sty 2021, 19:08 użytkownik Łukasz Samson notifications@github.com napisał:

OK I got this to work by changing

span = Tracer.create_span("graphql", Tracer.current_span()) |> Span.set_name(action)

to

span = Tracer.create_span("graphql") |> Span.set_name(action)

in my original code.

I'm going to investigate the leaking memory in the nest days.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/appsignal/appsignal-elixir/issues/614#issuecomment-754128315, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAAID6VJQWFXSRZXXSO3YDDSYH7YJANCNFSM4VTCP3IQ .

jeffkreeftmeijer commented 3 years ago

Hey both,

Thanks for opening this issue and showing us what you’ve tried. That’s very helpful! The way spans are implemented right now is that any child span can’t have any sample data added to it. This means that the sample data for a request should always be added to the root span (which you can request through Appsignal.Tracer.root_span/0).

In terms of Absinthe, I think the problem here is that the Plug instrumentation is already creating that root span for you. Your custom instrumentation then adds a child span, which will drop its custom instrumentation. In the current version, you should be able to correct the name for your Absinthe calls by using a Plug like this. Then, you can add any extra instrumentation to that span by requesting the root span and adding sample data to that.

Would some thing like that work for your use case?

hubertlepicki commented 3 years ago

@jeffkreeftmeijer when you were inserting a link, did you mean to link to this issue or you had different link that didn't get copied to your clipboard properly in mind?

jeffkreeftmeijer commented 3 years ago

@hubertlepicki Ha, cliffhanger! You’re right, here's the link; https://github.com/appsignal/test-setups-experimental/blob/absinthe/elixir/phoenix-example/app/lib/appsignal_absinthe_plug.ex, I’ve updated my previous comment. Thanks!

hubertlepicki commented 3 years ago

@jeffkreeftmeijer that is precisely what I am doing now too, but I am thinking on moving that away from the plug to the telemetry handler. The reason is that I not only execute graphql queries from web requests, but also perform them from background job against local schema. I already start span there, so should work similar way.

I have three more questions to you, however, if you can help:

  1. What happens if I start two root spans? As in, one starts in Plug, and another one somewhere in my middleware. Is this asking for trouble and there should be only one root span at the same time?

  2. What happens if I don't close the span at all and the process dies/finishes. Is the span closed by Appsignal for me (I suspect by Tracker setting up monitor?)

jeffkreeftmeijer commented 3 years ago

We’re using the Plug to instrument Plug apps because it allows us to wrap requests earlier in the pipeline. This allowed us to give more insights than we could only running on Telemetry.

The spans for absinthe calls could indeed work in the same way, but you’d have to deal with the already-started span you get from Appsignal.Plug. There’s currently no way around that.

  1. Our implementation relies on having a “current” span. Spans on a process are a stack, so the current one is the one popped from the list. We don’t support having multiple root spans, and we recommend using Tracer.create_span/2-3 and passing the current span. I’ll see if we can make that clearer in the documentation and implementation.
  2. If you don’t close a span, we’ll make sure to clean it out of the ets table (through Appsignal.Monitor) and agent, but we currently don’t finish it for you. Right now, you’re responsible for always closing custom created spans. However, this might be a nice addition we might implement in a future version. Thanks!
lukaszsamson commented 3 years ago

@jeffkreeftmeijer I know that in my case there is a root span started by plug instrumenter on the router. In my initial code i tried to create a subspan of that span via

span =
  Tracer.create_span("graphql", Tracer.current_span())
  |> Span.set_name(action)
...
# close the subspan
Appsignal.Tracer.close_span(span)

Yet it did not work - not even the root span events were reported

Then I tried what @hubertlepicki suggested

span =
  Tracer.current_span()
  |> Span.set_name(action)
...
# no span close in this case as no subspan was created

With same results - nothing reported

What worked for me is

span =
  Tracer.create_span("graphql")
  |> Span.set_name(action)
...
Appsignal.Tracer.close_span(span)

Is the span I am creating a child of the root span or is it a separate root of its own?

Should I had to use Appsignal.Tracer.root_span in my code instead of Tracer.current_span? If I understand correctly in this case both functions would return the same span.

BTW Appsignal.Plug.put_name does not cover all of my requirements as I need to be able to attach query/mutation resolution params besides action.

jeffkreeftmeijer commented 3 years ago

Hey @lukaszsamson,

Your first example should work, but it will be reported as a child span of the root. This means that it shows up within the reported sample as what we previously called an event. As for the passed namespace, that’s ignored for child spans because this function call creates a root with a namespace if the passed parent is nil, meaning this event might be added to a span in the web namespace, for example.

To check this, could you turn on both debug and transaction_debug_mode and tail -f /tmp/appsignal.log while calling this function locally? That should give us some more insight in what span is created as a child of what. Also, that should allow us to check the order of the span closing, as we don’t support closing spans out of order, which might have something to do with this.

That API is a bit strange, which is why we recommend using Appsignal.instrument/2-3 instead, which will find the current span if there is one, create a child, run your function and then close the span. In your case, that doesn’t seem like it’d work, as the span doesn’t open and close in the same function (which is the main problem we have while instrumenting telemetry actions like this).

jeffkreeftmeijer commented 3 years ago

I’m closing this for now. Please reopen if the above didn’t work for you! :)

benonymus commented 1 year ago

Hey @jeffkreeftmeijer ,

Is this still the way to do this?

I am trying to add some tags (tried params, cutom_data, etc) and nothing gets added to the reported trace. I followed this: https://docs.appsignal.com/guides/custom-data/tagging-request.html#elixir I am doing the action in a plug in the endpoint before we hit the router. Exactly like this: https://github.com/appsignal/test-setups/blob/9bb347ab1dcd2c46edbf666d965797799594d8d8/elixir/phoenix-example/app/lib/appsignal_phoenix_example_web/endpoint.ex#L58 Just with the code from the first link.

Thanks

jeffkreeftmeijer commented 1 year ago

@benonymus indeed, this is still the way. I’d like to see how you’ve set up your Plug so I can check what could be going wrong. Could you open a new issue or contact support@appsignal.com? I’ll pick up at the other end. :)

benonymus commented 1 year ago

Thanks, I am in contact with support already. Regular requests work, liveview does not.