apache / apisix

The Cloud-Native API Gateway
https://apisix.apache.org/blog/
Apache License 2.0
14.46k stars 2.52k forks source link

help request: OpenTelemetry plugin has problems handling Nginx variables of non-string type #9921

Open EverHopeful opened 1 year ago

EverHopeful commented 1 year ago

Description

Hello,

I'm encountering an issue when using the OpenTelemetry plugin with Apache APISIX. I want to add some Nginx variables (such as $request_time and $upstream_response_time) as additional attributes in my trace data.

Here is my configuration:

{
  "plugins": {
      "opentelemetry": {
          "sampler": {
              "name": "always_on"
          },
          "additional_attributes": [
              "upstream_response_time",
              "request_time",
              "host"
          ]
      }
  }
}

However, I get the following error when APISIX tries to process a request:

2023/07/27 20:31:33 [error] 50#50: *127339 lua entry thread aborted: runtime error: ...deps/share/lua/5.1/opentelemetry/trace/exporter/otlp.lua:67: bad argument #2 to 'encode' (string expected for field 'string_value', got number)

It seems that OpenTelemetry's OTLP exporter expects all attribute values to be strings, but the values of Nginx variables like $request_time and $upstream_response_time are not strings.

Is there a recommended way to convert these non-string values to strings before adding them as additional attributes? Or could the OpenTelemetry plugin be modified to automatically handle non-string attribute values?

Thank you for your help.

Environment

Sn0rt commented 1 year ago

Can you try without the "request_time" variable? I suspect that the error message is because the value is not a string type.

EverHopeful commented 1 year ago

Can you try without the "request_time" variable? I suspect that the error message is because the value is not a string type.

Thank you for your suggestion. I've conducted the test without the "$request_time" and "$upstream_response_time" variables, and confirmed that everything works as expected when using only the "host".

Here's my configuration.

{
  "plugins": {
      "opentelemetry": {
          "sampler": {
              "name": "always_on"
          },
          "additional_attributes": [
              "host"
          ]
      }
  }
}

Here's a screenshot of the results after my test. image

However, I would like to incorporate "request_time" and "response_time" in my setup. Could you possibly provide any guidance on how I should proceed to use them correctly considering the string type requirement? Any suggestions or workarounds would be greatly appreciated.

Sn0rt commented 1 year ago

However, I would like to incorporate "request_time" and "response_time" in my setup. Could you possibly provide any guidance on how I should proceed to use them correctly considering the string type requirement? Any suggestions or workarounds would be greatly appreciated.

There is no method yet, and it is still being processed. Adaptation development may be required.

and I will check the upstream lib . pls wait .

Sn0rt commented 1 year ago

I have try this https://opentelemetry.io/docs/collector/getting-started/ but not success.

@EverHopeful are you try this feature at newest version ? or provider the reproduce step ?

I have try the follow test case at newest version and the result is pass.

and I try this test case in yours version (2.15) can't pass.

so can you help me try the feature at newest release (3.4.1) currently for walk around ?

the test case copy from a exsting case , so it looks like there is some redundancy

=== TEST 19: set additional_attributes
--- config
    location /t {
        content_by_lua_block {
            local t = require("lib.test_admin").test
            local code, body = t('/apisix/admin/services/1',
                ngx.HTTP_PUT,
                [[{
                    "name": "service_name",
                    "upstream": {
                        "nodes": {
                            "127.0.0.1:1980": 1
                        },
                        "type": "roundrobin"
                    }
                }]]
            )
            if code >= 300 then
                ngx.status = code
                ngx.say(body)
                return
            end
            local code, body = t('/apisix/admin/routes/1',
                ngx.HTTP_PUT,
                [[{
                    "name": "route_name",
                    "plugins": {
                        "opentelemetry": {
                            "sampler": {
                                "name": "always_on"
                            },
                            "additional_attributes": [
                                "http_user_agent",
                                "arg_foo",
                                "cookie_token",
                                "remote_addr",
                                "upstream_response_time",
                                "request_time"

                            ]
                        }
                    },
                    "uri": "/opentracing",
                    "service_id": "1"
                }]]
                )

            if code >= 300 then
                ngx.status = code
            end
            ngx.say(body)
        }
    }

=== TEST 20: trigger opentelemetry, test trace_id_source=x-request-id, custom resource, additional_attributes
--- extra_yaml_config
plugins:
    - opentelemetry
plugin_attr:
    opentelemetry:
        trace_id_source: x-request-id
        resource:
            service.name: test
            test_key: test_val
        batch_span_processor:
            max_export_batch_size: 1
            inactive_timeout: 0.5
--- extra_init_by_lua
    local core = require("apisix.core")
    local otlp = require("opentelemetry.trace.exporter.otlp")
    local span_kind = require("opentelemetry.trace.span_kind")
    otlp.export_spans = function(self, spans)
        if (#spans ~= 1) then
            ngx.log(ngx.ERR, "unexpected spans length: ", #spans)
            return
        end

        local span = spans[1]
        if span:context().trace_id ~= "01010101010101010101010101010101" then
            ngx.log(ngx.ERR, "unexpected trace id: ", span:context().trace_id)
            return
        end

        local current_span_kind = span:plain().kind
        if current_span_kind ~= span_kind.server then
            ngx.log(ngx.ERR, "expected span.kind to be server but got ", current_span_kind)
            return
        end

        if span.name ~= "/opentracing?foo=bar&a=b" then
            ngx.log(ngx.ERR, "expect span name: /opentracing?foo=bar&a=b, but got ", span.name)
            return
        end

        local expected_resource_attrs = {
            test_key = "test_val",
        }
        expected_resource_attrs["service.name"] = "test"
        expected_resource_attrs["telemetry.sdk.language"] = "lua"
        expected_resource_attrs["telemetry.sdk.name"] = "opentelemetry-lua"
        expected_resource_attrs["telemetry.sdk.version"] = "0.1.1"
        expected_resource_attrs["hostname"] = core.utils.gethostname()
        local actual_resource_attrs = span.tracer.provider.resource:attributes()
        if #actual_resource_attrs ~= 6 then
            ngx.log(ngx.ERR, "expect len(actual_resource) = 6, but got ", #actual_resource_attrs)
            return
        end
        for _, attr in ipairs(actual_resource_attrs) do
            local expected_val = expected_resource_attrs[attr.key]
            if not expected_val then
                ngx.log(ngx.ERR, "unexpected resource attr key: ", attr.key)
                return
            end
            if attr.value.string_value ~= expected_val then
                ngx.log(ngx.ERR, "unexpected resource attr val: ", attr.value.string_value)
                return
            end
        end

        local expected_attributes = {
            service = "service_name",
            route = "route_name",
            http_user_agent = "test_nginx",
            arg_foo = "bar",
            cookie_token = "auth_token",
            remote_addr = "127.0.0.1",
        }
        if #span.attributes ~= 6 then
            ngx.log(ngx.ERR, "expect len(span.attributes) = 6, but got ", #span.attributes)
            return
        end
        for _, attr in ipairs(span.attributes) do
            local expected_val = expected_attributes[attr.key]
            if not expected_val then
                ngx.log(ngx.ERR, "unexpected attr key: ", attr.key)
                return
            end
            if attr.value.string_value ~= expected_val then
                ngx.log(ngx.ERR, "unexpected attr val: ", attr.value.string_value)
                return
            end
        end

        ngx.log(ngx.INFO, "opentelemetry export span")
    end
--- request
GET /opentracing?foo=bar&a=b
--- more_headers
X-Request-Id: 01010101010101010101010101010101
User-Agent: test_nginx
Cookie: token=auth_token;
--- response_body
opentracing
--- wait: 1
--- grep_error_log eval
qr/opentelemetry export span/
--- grep_error_log_out
opentelemetry export span
moonming commented 1 year ago

I have try the follow test case at newest version and the result is pass.

and I try this test case in yours version (2.15) can't pass.

Has this plugin fixed something between these two versions?

Sn0rt commented 1 year ago

the test case copy from a exsting case , so it looks like there is some redundancy

the opentelemetry-lua from 0.1-3 to 0.2-3.

codifierr commented 1 year ago

I am facing the same problem with APISIX 3.3.

codifierr commented 1 year ago

Same problem with 3.4.1.

Opentelemetry plugin config "opentelemetry": { "sampler": { "name": "always_on" }, "additional_attributes": [ "service_name", "graphql_name", "graphql_operation", "request_method", "bytes_received", "bytes_sent", "content_length", "request_time", "upstream_response_time", "upstream_status" ] }

Logs

2023-08-07 12:44:06 2023/08/07 07:14:06 [error] 149#149: *9438 lua entry thread aborted: runtime error: ...deps/share/lua/5.1/opentelemetry/trace/exporter/otlp.lua:104: bad argument #2 to 'encode' (string expected for field 'string_value', got number) 2023-08-07 12:44:06 stack traceback: 2023-08-07 12:44:06 coroutine 0: 2023-08-07 12:44:06 [C]: in function 'encode' 2023-08-07 12:44:06 ...deps/share/lua/5.1/opentelemetry/trace/exporter/otlp.lua:104: in function 'export_spans' 2023-08-07 12:44:06 ...are/lua/5.1/opentelemetry/trace/batch_span_processor.lua:45: in function 'process_batches' 2023-08-07 12:44:06 ...are/lua/5.1/opentelemetry/trace/batch_span_processor.lua:77: in function <...are/lua/5.1/opentelemetry/trace/batch_span_processor.lua:57>, context: ngx.timer, client: 172.18.0.1, server: 0.0.0.0:9080

Sn0rt commented 1 year ago

2023-08-07 12:44:06 2023/08/07 07:14:06 [error] 149#149: *9438 lua entry thread aborted: runtime error: ...deps/share/lua/5.1/opentelemetry/trace/exporter/otlp.lua:104: bad argument #2 to 'encode' (string expected for field 'string_value', got number)

thank you. I will try reproduce

shreemaan-abhishek commented 1 year ago

@Sn0rt any updates?

ahululu commented 2 weeks ago

any updates? I had the same problem with version 3.1: lua entry thread aborted: runtime error: ...deps/share/lua/5.1/opentelemetry/trace/exporter/otlp.lua:104: bad argument #2 to 'encode' (string expected for field 'string_value', got number)