elastic / logstash

Logstash - transport and process your logs, events, or other data
https://www.elastic.co/products/logstash
Other
75 stars 3.5k forks source link

FieldReference.RUBY_CACHE causes OOM error #10112

Closed mamohr closed 2 years ago

mamohr commented 6 years ago

Hi,

we got regular OOM errors in our logstash instance. Analyzing the heap dump showed that FieldReference.RUBY_CACHE grew to nearly 2GB:

image

I guess it is related to #9136.

Version: Logstash 6.4.2 using the docker image

ikus060 commented 3 years ago

I'm also facing this issue. I'm guessing it's related to snmptrap input plugin.

ikus060 commented 3 years ago

@original-brownbear I saw you fixed the problem for "CACHE" would it be possible to do the same for RUBY_CACHE too ?

kares commented 3 years ago

can confirm the field reference cache is still a leak candidate, even in recent LS versions.

could we please learn something about the setup - LS version, pipeline configuration(s) ? usually, in a "typical" LS setup the amount of field references one LS would run into is bounded. can imagine where this would not be the case e.g. using Ruby filters, would be nice to understand real-world scenarios before attempting a fix to make the cache LRU.

ikus060 commented 3 years ago

@kares Our pipeline is very simple.

input {
    snmptrap {
        port => 10162
        community => ["any"]
        yamlmibdir => "/opt/bitnami/logstash/vendor/bundle/jruby/2.5.0/gems/snmp-1.3.2/data/ruby/snmp"
    }
}
filter {
    if "default send string" in [message] {
      drop {}
    }
    uuid {
      target => "traceid"
    }
    mutate {
      add_field => {
        "spanid" => "${HOSTNAME}"
        "source_environment" => "${ENV}"
      }
    }
}

output {
    kafka {
        bootstrap_servers                     => "${KAFKA_BOOTSTRAP_SERVERS}"
        topic_id                              => "${KAFKA_OUTPUT_TOPIC}"
        codec                                 => "json"
        message_key                           => "%{host}"
        security_protocol                     => "${KAFKA_SECURITY_PROTOCOL}"
        sasl_mechanism                        => "${KAFKA_SASL_MECHANISM}"
        ssl_truststore_location               => "/etc/ssl/certs/bellca.truststore.jks"
        ssl_truststore_password               => "Password123"
    }
}

I suspect snmptrap input plugin to be the culprit. Using a heap dump, I saw FieldReference getting created for every Oid. Since we could have virtually an unlimited number of oid, this cause a memory leak.

vcostet commented 3 years ago

We are having the same issue with a json filter: our pipeline is receiving arbitrary json containing UUID as field name, and even if these fields are pruned, memory is leaking.

Is there anything preventing #13079 to be merged?

chenchuangc commented 2 years ago

i also meet this issue ....

caseydm commented 2 years ago

Hi all! Looking forward to this fix as I recently ran into this issue. The PR says that the impact of the fix will be JVM specific. Do you suspect the JVM bundled with the official logstash docker image will handle high-cardinality fields well?

yaauie commented 2 years ago

Do you suspect the JVM bundled with the official logstash docker image will handle high-cardinality fields well?

OpenJDK11 supposedly supports GC of manually-interned strings (which our fields are) per a variety of sources on the internet, but the behaviour is not promised in documentation, and the implementation of String#intern is native and therefore platform dependent. Therefore, the admitedly-naïve approach of the PR merely limits the size of this particular cache that was troublesome when unbounded, and does not promise that there won't be further issues deeper down.