elastic / logstash

Logstash - transport and process your logs, events, or other data
https://www.elastic.co/products/logstash
Other
42 stars 3.5k forks source link

Huge bug during high load #5946

Closed kirillkrainov closed 7 years ago

kirillkrainov commented 8 years ago

logstash version: 2.3.2

We have an issue on our production logstash with http input, that http input plugin mixes url arguments from one request with body from another request.

Input config:

http{
    port => 5550
    codec => "line"
    threads => 8
    type => "http"
  }

Filter config:

json {
                source => "message"
                target => "data"
            }
            grok {
                match => { "[headers][request_uri]" => "^/logs\?project=(?<source_project>.+)\&type=(?<source_type>\w+)\&status=(?<source_status>\w+)$" }
            }
            mutate{
                add_field => { "kafka_source" => "%{source_project}.%{source_type}.%{source_status}" }
            }
            mutate {
                add_field => {
                        "[data][source][stream_time]" => "%{+yyyy-MM-dd HH:mm:ss.SSSSSSSSS}"
                }
                rename => {
                    "type" => "[@metadata][type]"
                    "source_project" => "[data][source][project]"
                    "source_type" => "[data][source][type]"
                    "source_status" => "[data][source][status]"
                }
            }
            json_encode {
                source => "data"
                target => "message"
            } 

We reproduced this issue on test logstash instance with same config, and with simple Python client, which sends http POST requests. Python client:

while True:
        projects = ['project_1', 'project_2']
        for project in projects:
            pack = '{"event_type":"' + project + '"}\n'
            requests.post(
                url='http://localhost:5550/logs?project={}&type=server&status=prod'.format(project),
                data=pack
            )

If we send requests with 1 thread on client side, everything is ok, but with 10+ threads logstash-http-input plugin starts mixing url headers and bodies.

For example, we send requests "{"event_type":"project_1"}\n" with url http://localhost:5550/logs?project=project_1&type=server&status=prod and "{"event_type":"project_2"}\n" with url http://localhost:5550/logs?project=project_2&type=server&status=prod, in normal situation we should see the following results after all logstash processing steps:

{"event_type":"project_2","source":{"project":"project_2","type":"server","status":"prod","stream_time":"2016-09-19 15:12:52.860000000"}}
{"event_type":"project_1","source":{"project":"project_1","type":"server","status":"prod","stream_time":"2016-09-19 15:12:52.860000000"}}

But we see the following error in stdout:

unable to process event {"request_method"=>"POST", "request_path"=>"/logs", "request_uri"=>"/logs?project=project_1&type=server&status=prod", "http_version"=>"HTTP/1.1", "http_host"=>"localhost:5550", "content_length"=>"49", "http_user_agent"=>"python-requests/2.9.1", "http_connection"=>"keep-alive", "http_accept"=>"*/*", "http_accept_encoding"=>"gzip, deflate"}. exception => #<TypeError: can't convert nil into String> {:level=>:error}

And the following wrong result(for example):

{"event_type":"project_2","time":1460937593924,"source":{"project":"project_1","type":"server","status":"prod","stream_time":"2016-09-19 15:33:30.840000000"}}

Looks like something going wrong in highload moments, when all 8 threads in http-input are busy.

jordansissel commented 7 years ago

This seems like a problem specific to the HTTP input. Please open this ticket on that repo -- https://github.com/logstash-plugins/logstash-input-http/issues