elastic / logstash

Logstash - transport and process your logs, events, or other data
https://www.elastic.co/products/logstash
Other
14.2k stars 3.5k forks source link

flat internal Event representation #1968

Open colinsurprenant opened 9 years ago

colinsurprenant commented 9 years ago

an idea worth exploring is to move from an internal Hash/hierarchical event representation to a simpler flat representation. this is really some early brainstorming, please contribute ideas, thoughts, comments.

Overall goals: The LogStash::Event object supports nested structures and a special syntax for accessing nested field (field references). Internally, the object's JSON representation is basically the same as the object itself. Can be a hash of hash of hash, or whatever. From a memory usage perspective, this can consume lots of object references. From a serialization point of view, visiting all the objects can be costly. We are interested in exploring some internal-representation improvements that should improve per-event memory usage and event serialization costs.

basically instead of having an internal object hierarchy representation like

{
  "message" => "foo",
  "geoip" => {
    "coords" => {
      "latitude" => 45.5,
      "longitude" => 73.5667
    }
  }
}

we could have something like

{
  "message" => "foo",
  "geoip.coords.latitude" => 45.5,
  "geoip.coords.longitude" => 73.5667,
}

or, using the logstash path convention:

{
  "[message]" => "foo",
  "[geoip][coords][latitude]" => 45.5,
  "[geoip][coords][longitude]" => 73.5667,
}

This could be done while preserving the current Event api, making this backward compatible.

Pros:

Cons:

Thoughts?

clintongormley commented 9 years ago

I think the only gotcha is: How would you handle arrays of objects?

"visits": [
    { "page": "/", "duration": "5s"},
    { "page": "/help", "duration": "10s"},
    ....
]
jordansissel commented 9 years ago

@clintongormley arrays in general are possibly difficult here, and we probably can't do it the same way ES does.

The existing fieldref syntax allows array access of your 'duration' field like [visits][0][duration] and [visits][1][duration]

colinsurprenant commented 9 years ago

right, the complete @clintongormley example would become

[visits][0][page] => "/"
[visits][0][duration] => "5s"
[visits][1][page] => "/help"
[visits][1][duration] => "10s"
jordansissel commented 8 years ago

@colinsurprenant thoughts on this? We can still address this later, but I think for now, we can close this until we want to revisit it. Thoughts?

colinsurprenant commented 8 years ago

I think the idea of exploring alternate inner Event data representation is worth keeping open but this is definitely not anything on the radar for now since our focus is now on the Java Event implementation and the added serialization benefits and the new Java Accessors impl which has also improved field reference access performance.