elastic / logstash

Logstash - transport and process your logs, events, or other data
https://www.elastic.co/products/logstash
Other
14.15k stars 3.49k forks source link

unify JSON serialization using ObjectMappers #10738

Open colinsurprenant opened 5 years ago

colinsurprenant commented 5 years ago

We currently use 2 different ways to do JSON serialization:

We've witnessed problems where both methods resulted in different behaviours for example with this known bignum problem #10720

JrJackson::Base.generate({"b"=> 123121231231321123132132132132132132132123132132132132132132132132132165468486946541})
Traceback (most recent call last):
       12: from /Users/colin/dev/src/elasticsearch/logstash/lib/bootstrap/environment.rb:73:in `<main>'
       11: from /Users/colin/dev/src/elasticsearch/logstash/vendor/bundle/jruby/2.5.0/gems/clamp-0.6.5/lib/clamp/command.rb:132:in `run'
       10: from /Users/colin/dev/src/elasticsearch/logstash/logstash-core/lib/logstash/runner.rb:237:in `run'
        9: from /Users/colin/dev/src/elasticsearch/logstash/vendor/bundle/jruby/2.5.0/gems/clamp-0.6.5/lib/clamp/command.rb:67:in `run'
        8: from /Users/colin/dev/src/elasticsearch/logstash/logstash-core/lib/logstash/runner.rb:281:in `execute'
        7: from /Users/colin/dev/src/elasticsearch/logstash/logstash-core/lib/logstash/runner.rb:465:in `start_shell'
        6: from org/jruby/RubyKernel.java:1193:in `catch'
        5: from org/jruby/RubyKernel.java:1193:in `catch'
        4: from org/jruby/RubyKernel.java:1425:in `loop'
        3: from org/jruby/RubyKernel.java:1061:in `eval'
        2: from (irb):1:in `evaluate'
        1: from com/jrjackson/JrJacksonBase.java:70:in `generate'
RangeError (bignum too big to convert into `long')
LogStash::Json.dump({"b"=> 123121231231321123132132132132132132132123132132132132132132132132132165468486946541})
Traceback (most recent call last):
       12: from /Users/colin/dev/src/elasticsearch/logstash/lib/bootstrap/environment.rb:73:in `<main>'
       11: from /Users/colin/dev/src/elasticsearch/logstash/vendor/bundle/jruby/2.5.0/gems/clamp-0.6.5/lib/clamp/command.rb:132:in `run'
       10: from /Users/colin/dev/src/elasticsearch/logstash/logstash-core/lib/logstash/runner.rb:237:in `run'
        9: from /Users/colin/dev/src/elasticsearch/logstash/vendor/bundle/jruby/2.5.0/gems/clamp-0.6.5/lib/clamp/command.rb:67:in `run'
        8: from /Users/colin/dev/src/elasticsearch/logstash/logstash-core/lib/logstash/runner.rb:281:in `execute'
        7: from /Users/colin/dev/src/elasticsearch/logstash/logstash-core/lib/logstash/runner.rb:465:in `start_shell'
        6: from org/jruby/RubyKernel.java:1193:in `catch'
        5: from org/jruby/RubyKernel.java:1193:in `catch'
        4: from org/jruby/RubyKernel.java:1425:in `loop'
        3: from org/jruby/RubyKernel.java:1061:in `eval'
        2: from (irb):2:in `evaluate'
        1: from /Users/colin/dev/src/elasticsearch/logstash/logstash-core/lib/logstash/json.rb:27:in `jruby_dump'
LogStash::Json::GeneratorError (bignum too big to convert into `long')
e = LogStash::Event.new({"b"=> 123121231231321123132132132132132132132123132132132132132132132132132165468486946541})
=> #<LogStash::Event:0x74d6d08a>
e.to_json
=> "{\"@timestamp\":\"2019-04-17T18:41:37.930Z\",\"b\":123121231231321123132132132132132132132123132132132132132132132132132165468486946541,\"@version\":\"1\"}"

I suggest we unify the way we do serialization. From the Event POV we rely on the internal Ruby/Java type mapping and our use of Jackson leverages the "Java" types view over the data. But from a Ruby plugin POV, serializing a Ruby object goes through JrJackson own type converters.

We should probably run some benchmarks over both methods and pick one. I am uder the impression that our ObjectMappers methods is problably more efficient.

guyboertje commented 5 years ago

I have no vested interest in keeping jrjackson in Logstash.

If there is a way to package the ObjectMapper code such that plugins can use it then I'm all for removing JrJackson.

colinsurprenant commented 5 years ago

@guyboertje since ObjectMappers is in logstash-core and plugins depends on logstash-core, it will be "packaged" by default. In the end, this change should only be an implementation change of LogStash::Json.dump. I believe we'd still leverage JrJackson for deserialization. In that respect, any other direct use of JrJackson (or even JSON#parse/dump) in plugins should still work (while not desirable, but that's another subject).

My question here is more about the potential benefits/problems in going forward with such a strategy.

guyboertje commented 5 years ago

Some plugins use the regular JSON gem just because the author didn't know that they can use anything else. Some plugins use JSON for testing. Some plugins serialize into JSON for network transmission (http filter) others deserialize JSON but not necessarily as an Event representation. We should consider the implications of people creating Ruby filter code that uses JSON de/ser.

colinsurprenant commented 5 years ago

When we introduced LogStash::Json we went through all plugins to simply change the JSON usage to LogStash::Json. Ideally we should continue fixing plugin which still uses JSON. There is no need nor benefits in using JSON, that was the whole point of introducing LogStash::Json; to have a single JSON serializer/deserializer throughout the whole logstash ecosystem. The idea here is that the LogStash::Json implementation can be changed and all will benefit. My suggestion here is to consider changing that implementation to use the ObjectMappers strategy instead of JrJackson if it proves to be more efficient.

rocco8620 commented 2 years ago

Hello, i suppose this changes may be beneficial for #13253 is there any update on the matter?