Kong / kong-plugin-zipkin

A Kong plugin for propogating zipkin spans and reporting spans to a zipkin server - this plugin has been moved into https://github.com/Kong/kong, please open issues and PRs in that repo
Apache License 2.0
60 stars 31 forks source link

Exception in Annotate Assert #104

Closed nilzzzzzz closed 3 years ago

nilzzzzzz commented 3 years ago

Hi,

i am having a weird issue in my kong production instance. A small percentage of requests fail with status code 400 (upstream server returned 200) and in those cases i see in the kong log an exception coming from here:

 failed to run log_by_lua*: /usr/local/share/lua/5.1/kong/plugins/zipkin/span.lua:124: invalid annotation timestamp
`stack traceback:
    [C]: in function 'assert'
    /usr/local/share/lua/5.1/kong/plugins/zipkin/span.lua:124: in function 'annotate'
    /usr/local/share/lua/5.1/kong/plugins/zipkin/handler.lua:254: in function </usr/local/share/lua/5.1/kong/plugins/zipkin/handler.lua:228>
    /usr/local/share/lua/5.1/kong/init.lua:238: in function 'execute_plugins_iterator'
    /usr/local/share/lua/5.1/kong/init.lua:1153: in function 'log'
    log_by_lua(nginx-kong.conf:105):2: in main chunk while logging request, client: XXX, server: kong, request:

Any idea, why the annotation timestamp could be wrong or in which direction i should look?

Thanks in advance

Nils

kikito commented 3 years ago

Thanks for reporting this. Could you please give us more information that could help replicating the issue in our end, like what version of Kong are you using and whether you are using any other plugins on these failing requests?

nilzzzzzz commented 3 years ago

We are using Kong Version 2.1.4.

On the route which has most of the occurrences of the exception we have no other plugin than the global zipkin plugin.

I would like to share more information, but really no clue what correlates to this issue. I checked out the handler.lua and span.lua, but could not find a reason why a wrong timestamp could be passed to the annotate function..

nilzzzzzz commented 3 years ago

I researched a bit more. This issue only occurs for users which are using a tool called addguard, which seems to use a proxy server for detecting malware / advertisements. No idea how that can influence the tracing in kong though..

Asafb26 commented 3 years ago

we had the same issue, it looks like we had a typo or leftover in the span timestamp field which used only if the access phase is skipped

kikito commented 3 years ago

Hello @nilzzzzzz I think @Asafb26 's fix might be the source of your issue. I have merged it in master, it will come out on the next version of the plugin/Kong.

If you need to fix the issue right faster, I suggest you use the master branch of this plugin, which is stable.

nilzzzzzz commented 3 years ago

@kikito thanks for the update and @Asafb26 thanks for the fix!