apache / apisix

The Cloud-Native API Gateway
https://apisix.apache.org/blog/
Apache License 2.0
14.52k stars 2.52k forks source link

bug: The kafka-logger plugin cannot collect the value of each nginx built-in variable correctly #5512

Closed sandy420 closed 2 years ago

sandy420 commented 3 years ago

Issue description start

   I found that the values obtained by $request_time, upstream_response_time, and upstream_header_time in the logs collected by kafka-logger are wrong, and the values of the nginx variables output in the access.log under the logs directory of apisix are correct.
  In theory, a request should be $request_time greater than upstream_response_time; $upstream_response_time greater than upstream_header_time, but the results of the ELK query in the ELK collected by the kafka-logger plugin do not match the actual.

lQLPDhraI8bUmAbNAXrNA7CwCkfx2evB-GwBkqSZGEAkAA_944_378

image

Issue description end

Environment

Steps to reproduce

The kafka-logger plugin log_format configuration:

{ "log_format": { "server_name": "$server_name", "upstream_cache_status": "$upstream_cache_status", "request_method": "$request_method", "upstream_http_content_disposition": "$upstream_http_content_disposition", "request_time": "$request_time", "http_user_agent": "$http_user_agent", "upstream_response_time": "$upstream_response_time", "upstream_header_time": "$upstream_header_time", "upstream_connect_time": "$upstream_connect_time", "request_length": "$request_length", "http_app_jb": "$http_app_jb", "server_protocol": "$server_protocol", "upstream_scheme": "$upstream_scheme", "upstream_bytes_received": "$upstream_bytes_received", "realip_remote_addr": "$realip_remote_addr", "remote_addr": "$remote_addr", "http_referer": "$http_referer", "upstream_http_content_type": "$upstream_http_content_type", "uri": "$uri", "request_completion": "$request_completion", "body_bytes_sent": "$body_bytes_sent", "msec": "$msec", "http_x_forwarded_for": "$http_x_forwarded_for", "http_client_info": "$http_client_info", "bytes_sent": "$bytes_sent", "@timestamp": "$time_iso8601", "content_type": "$content_type", "server_port": "$server_port", "upstream_uri": "$upstream_uri", "upstream_status": "$upstream_status", "route_name": "$route_name", "url": "$url", "host": "$host", "status": "$status", "upstream_addr": "$upstream_addr", "consumer_name": "$consumer_name", "upstream_host": "$upstream_host", "http_x_cat_parent_id": "$http_x_cat_parent_id", "upstream_http_content_length": "$upstream_http_content_length", "content_length": "$content_length", "scheme": "$scheme", "remote_user": "$remote_user", "route_id": "$route_id", "request_uri": "$request_uri" } }

Actual result

image

Error log

no

Expected result

The kafka-logger plugin can correctly collect the value of each nginx built-in variable

tzssangglass commented 3 years ago

here is my test:

  {
    "server_protocol": "HTTP/1.1",
    "@timestamp": "2021-11-16T12:28:39+08:00",
    "upstream_bytes_received": "528",
    "server_name": "_",
    "upstream_http_content_type": "application/json",
    "upstream_status": "200",
    "request_completion": "OK",
    "body_bytes_sent": "298",
    "upstream_addr": "198.18.2.225:80",
    "msec": "1637036919.270",
    "server_port": "9080",
    "route_id": "1",
    "upstream_uri": "",
    "request_method": "GET",
    "request_time": 0.426,
    "status": 200,
    "upstream_response_time": 0.426,
    "upstream_header_time": 0.426,
    "upstream_connect_time": 0.001,
    "upstream_scheme": "http",
    "bytes_sent": 526,
    "realip_remote_addr": "127.0.0.1",
    "remote_addr": "127.0.0.1",
    "host": "127.0.0.1",
    "uri": "/get",
    "http_user_agent": "curl/7.64.1",
    "request_uri": "/get",
    "request_length": 81,
    "upstream_http_content_length": "298",
    "scheme": "http",
    "upstream_host": "127.0.0.1:9080"
  }

same as you, you can see that some of the data is not normal.

I don't know why request_time , upstream_response_time and upstream_connect_time are 0 in your case.

however, request_time and upstream_response_time may not be accurate, see https://stackoverflow.com/questions/53978695/how-can-request-time-be-less-than-upstream-response-time-in-nginx and https://github.com/apache/apisix/issues/5146

if you want to get exact upstream_response_time, take a look at: https://github.com/openresty/openresty/issues/206#issuecomment-260101217

these variables are not inherently accurate, I don't think this is a bug in the kafka-logger plugin, you need to do some calculations, like: https://github.com/apache/apisix/blob/718eb4b21efb2e640939f864441c5e84b7c46cad/apisix/plugins/prometheus/exporter.lua#L147-L164

tzssangglass commented 3 years ago

I have implemented kafka-logger to calculate request latency, but I did it by adding a filter to logstash and writing the formula in the filter.

tokers commented 3 years ago

here is my test:

  {
    "server_protocol": "HTTP/1.1",
    "@timestamp": "2021-11-16T12:28:39+08:00",
    "upstream_bytes_received": "528",
    "server_name": "_",
    "upstream_http_content_type": "application/json",
    "upstream_status": "200",
    "request_completion": "OK",
    "body_bytes_sent": "298",
    "upstream_addr": "198.18.2.225:80",
    "msec": "1637036919.270",
    "server_port": "9080",
    "route_id": "1",
    "upstream_uri": "",
    "request_method": "GET",
    "request_time": 0.426,
    "status": 200,
    "upstream_response_time": 0.426,
    "upstream_header_time": 0.426,
    "upstream_connect_time": 0.001,
    "upstream_scheme": "http",
    "bytes_sent": 526,
    "realip_remote_addr": "127.0.0.1",
    "remote_addr": "127.0.0.1",
    "host": "127.0.0.1",
    "uri": "/get",
    "http_user_agent": "curl/7.64.1",
    "request_uri": "/get",
    "request_length": 81,
    "upstream_http_content_length": "298",
    "scheme": "http",
    "upstream_host": "127.0.0.1:9080"
  }

same as you, you can see that some of the data is not normal.

I don't know why request_time , upstream_response_time and upstream_connect_time are 0 in your case.

however, request_time and upstream_response_time may not be accurate, see https://stackoverflow.com/questions/53978695/how-can-request-time-be-less-than-upstream-response-time-in-nginx and #5146

if you want to get exact upstream_response_time, take a look at: openresty/openresty#206 (comment)

these variables are not inherently accurate, I don't think this is a bug in the kafka-logger plugin, you need to do some calculations, like:

https://github.com/apache/apisix/blob/718eb4b21efb2e640939f864441c5e84b7c46cad/apisix/plugins/prometheus/exporter.lua#L147-L164

Which some fields are abnormal?

tzssangglass commented 3 years ago

request_time, upstream_response_time and upstream_header_time are same

tokers commented 3 years ago

request_time, upstream_response_time and upstream_header_time are same

It's common that these three variables are same, since if you don't have some heavy logics on the gateway side, request time will be same to the upstream_response_time (the precision is about 1 ms), and if APISIX doesn't cost some time to receive response body from upstream (say content-length is 0), the upstream_header_time will be almost same to upstream_response_time.

tzssangglass commented 3 years ago

got

github-actions[bot] commented 2 years ago

This issue has been marked as stale due to 350 days of inactivity. It will be closed in 2 weeks if no further activity occurs. If this issue is still relevant, please simply write any comment. Even if closed, you can still revive the issue at any time or discuss it on the dev@apisix.apache.org list. Thank you for your contributions.

github-actions[bot] commented 2 years ago

This issue has been closed due to lack of activity. If you think that is incorrect, or the issue requires additional review, you can revive the issue at any time.

ramoncjs3 commented 1 year ago

I also encountered this problem, do not know if there is a solution?