elastic / elastic-agent-shipper

Data shipper for the Elastic Agent - single, unified way to add monitoring for logs, metrics, and other types of data to a host.
Other
9 stars 19 forks source link

The shipper event protocol should be a wrapper for JSON bytes #288

Open cmacknz opened 1 year ago

cmacknz commented 1 year ago

Edit: the path forward is described in https://github.com/elastic/elastic-agent-shipper/issues/288#issuecomment-1551730615

We should change the message definition to simply wrap the JSON serialized event that would be written to Elasticsearch. We will use a oneof type to allow for alternate serialization formats in the future, in particular the alternate vtprotobuf implementation appears to be more efficient than JSON and is the first candidate for an alternate format.

message Event {
 // JSON serialized event to be written to Elasticsearch.
 oneof event {
   bytes json = 1;
 }
}

Original Description This is using a oneof type because we made find or decide that we could benefit from a more specialized protobuf event in the future (particularly given the results above showing vtprotobuf serialization is faster than JSON), but we can start from where we are today.

We will have to document the structure of the JSON event in the comments, for example the data stream fields are mandatory for the shipper to be able to construct the index name and apply processors.

The shipper protocol currently only accepts events serialized to a customized version of the google.protobuf.Struct type. The intent is to allow more efficiently serializing frequently used types like timestamps and make processors easier to write and validate given the complete set of types they must operate on are known ahead of time.

We currently define both the message metadata and fields as our messages.Struct type https://github.com/elastic/elastic-agent-shipper-client/blob/1fbbb05f0b174053a5b160cdd5836eaed430cdbd/api/messages/publish.proto#L39-L42

 // Metadata JSON object (map[string]google.protobuf.Value)
 messages.Struct metadata = 4;
 // Field JSON object (map[string]google.protobuf.Value)
 messages.Struct fields = 5;

Given that most processes that will use the shipper are currently designed to serialize their internal event representations to JSON for direct ingestion by Elasticsearch, we should evaluate whether there is a noticeable performance hit introducing the conversion to messages.Struct for the shipper. Most processes using the shipper are highly optimized for serializing to JSON and may be noticeably less performant serializing to messages.Struct instead.

Specifically, we should benchmark the performance of Filebeat ingesting events using the shipper with the messages.Struct type and compare it to the performance of the same setup modified to transport the event as JSON bytes directly:

 // Metadata JSON object.
 bytes metadata = 4;
 // Fields JSON object.
 bytes fields = 5;
cmacknz commented 1 year ago

If the benchmarks are better using serialized JSON instead of google.protobuf.Struct (which is what we expect), then let's make the change to the protocol to use JSON going forward.

We can also explore other tweaks to the protocol if the performance tests reveal any other new information we need to take into account.

leehinman commented 1 year ago

Initial results

I tested round tripping (Unmarshal then Marshal) events because in our use case clients will have to Marshall and the shipper will Unmarshal, so the impact of both is important. Also I used 2 types of events. The first (SingleField) was where the data for the event was stored in a single field ("message").

example:

{"@timestamp":"2023-04-20T14:51:58.913Z","@metadata":{"beat":"filebeat","type":"_doc","version":"8.7.0"},"log":{"offset":0,"file":{"path":"/Users/hinman/tmp/clf_1765593378.log"}},"message":"66.4.203.154 - - [20/Apr/2023:09:49:46 -0500] \"GET /random-47.html HTTP/2\" 200 1318 - \"Mozilla/5.0 (Macintosh; Intel Mac OS X 12.3; rv:98.0) Gecko/20100101 Firefox/98.0\"","input":{"type":"filestream"},"host":{"mac":["36-B9-EA-B0-ED-00","36-B9-EA-B0-ED-04","36-B9-EA-B0-ED-08","44-67-52-02-50-BD","A2-64-0B-12-7F-B6","A2-64-0B-12-7F-B7","A2-64-0B-12-7F-B8","A2-64-0B-12-7F-D6","A2-64-0B-12-7F-D7","A2-64-0B-12-7F-D8","B6-E3-AB-15-A1-BB","BC-D0-74-23-C9-F2","BE-D0-74-23-C9-F2"],"name":"elastic2","hostname":"elastic2","architecture":"arm64","os":{"version":"13.3.1","family":"darwin","name":"macOS","kernel":"22.4.0","build":"22E261","type":"macos","platform":"darwin"},"id":"3F8C45C1-BAF2-5AED-8484-947CD6BAD36A","ip":["fe80::a064:bff:fe12:7fd8","fe80::a064:bff:fe12:7fd7","fe80::a064:bff:fe12:7fd6","fe80::5d:9a80:93b8:ab9","172.16.1.205","fe80::b4e3:abff:fe15:a1bb","fe80::b4e3:abff:fe15:a1bb","fe80::a040:495d:28b:9e86","fe80::7fda:39e1:2de:127a","fe80::ce81:b1c:bd2c:69e","fe80::1cbd:4d85:36d5:a9c8","172.16.1.240"]},"agent":{"version":"8.7.0","ephemeral_id":"6a1557ad-d218-4d77-bc2a-d5d2d218e158","id":"21cb6281-9e90-4723-8d58-319249607e90","name":"elastic2","type":"filebeat"},"ecs":{"version":"8.0.0"}}

The second type of event was where the data for the event was added to the root of the message as fields (MultiField).

example:

{"@timestamp":"2023-04-19T21:01:07.001Z","@metadata":{"beat":"filebeat","type":"_doc","version":"8.7.0"},"agent":{"id":"d2b3c240-5d88-41c6-a03e-f7401342dbea","name":"elastic2","type":"filebeat","version":"8.7.0","ephemeral_id":"b2b538e8-7f35-4648-b938-9c5a3cff1989"},"event":{"event_type":"netflow","src_ip":"118.9.14.112","proto":"UDP","timestamp":"2023-04-19T15:52:43.671103-0500","src_port":34177,"dst_ip":"12.163.211.175","dst_port":52025,"flow_id":6129484611666145821,"netflow":{"max_ttl":72,"pkts":94,"bytes":64579,"start":"2023-04-19T15:52:43.671103-0500","end":"2023-04-19T15:52:43.671103-0500","age":0,"min_ttl":72}},"event_timestamp":"1681937563","firewall_name":"Firewall-81","input":{"type":"filestream"},"host":{"os":{"build":"22E261","type":"macos","platform":"darwin","version":"13.3.1","family":"darwin","name":"macOS","kernel":"22.4.0"},"id":"3F8C45C1-BAF2-5AED-8484-947CD6BAD36A","name":"elastic2","ip":["fe80::a064:bff:fe12:7fd8","fe80::a064:bff:fe12:7fd7","fe80::a064:bff:fe12:7fd6","fe80::5d:9a80:93b8:ab9","172.16.1.205","fe80::b4e3:abff:fe15:a1bb","fe80::b4e3:abff:fe15:a1bb","fe80::a040:495d:28b:9e86","fe80::7fda:39e1:2de:127a","fe80::ce81:b1c:bd2c:69e","fe80::1cbd:4d85:36d5:a9c8","172.16.1.240"],"mac":["36-B9-EA-B0-ED-00","36-B9-EA-B0-ED-04","36-B9-EA-B0-ED-08","44-67-52-02-50-BD","A2-64-0B-12-7F-B6","A2-64-0B-12-7F-B7","A2-64-0B-12-7F-B8","A2-64-0B-12-7F-D6","A2-64-0B-12-7F-D7","A2-64-0B-12-7F-D8","B6-E3-AB-15-A1-BB","BC-D0-74-23-C9-F2","BE-D0-74-23-C9-F2"],"hostname":"elastic2","architecture":"arm64"},"availability_zone":"eu-west-1c","log":{"file":{"path":"/Users/hinman/tmp/firewall_2474911009.log"},"offset":0},"ecs":{"version":"8.0.0"}}

Both types of events were generated by running filebeat against sample data and using the file output. benchtime was set to 30 seconds.

SingleField Results

Condition Iterations ns/op B/op allocs/op
existing protobuf 6516 5449813 1736473 45540
mapstr and stdlib json 10000 3168283 1535909 33771
protobuf with Fields and Metadata as strings 216559 169403 371089 1320
protobuf Fields/Metadata Marshal/Unmarshal 10000 3149967 2059934 35201

MultiField Results

Condition Iterations ns/op B/op allocs/op
existing protobuf 2030 17851260 5402970 144140
mapstr and stdlib json 3241 11121605 4814398 106575
protobuf with Fields and Metadata as strings 69279 521275 1110870 3720
protobuf Fields/Metadata Marshal/Unmarshal 3520 10299411 6400501 110595

The fourth condition was added, because clients will still have to spend time Marshalling the fields and metadata to strings to send the data. The shipper will have to spend time Unmarshalling to use the fields for processors or access the metadata.

https://github.com/leehinman/elastic-agent-shipper-client/tree/benchmark_base https://github.com/leehinman/elastic-agent-shipper-client/tree/benchmark_string_fields

cmacknz commented 1 year ago

Just to confirm I am interpreting this correctly, using the MultiField example as a reference:

I would then conclude that we should absolutely switch to using JSON serialization for the metadata and fields. We can iterate on making the JSON serialization faster, or attempting to optimistically avoid unmarshalling the JSON if we don't need to look at the event metadata or fields later.

Is there any reason you tested with the fields as string instead of bytes? I would have used bytes so we don't require a UTF-8 encoding, but maybe there is a reason we want this. The UTF-8 requirement is the only difference between string and bytes.

leehinman commented 1 year ago

Just to confirm I am interpreting this correctly, using the MultiField example as a reference:

* The existing protobuf implementation is 60% slower than mapstr and stdlib json: `17851260.0 / 11121605.0 = 1.60`

* The realistic protobuf unmarshal/marshal fields case is narrowly faster than mapstr and stdlib json `10299411.0 / 11121605.0 = 0.92`

yes

I would then conclude that we should absolutely switch to using JSON serialization for the metadata and fields. We can iterate on making the JSON serialization faster, or attempting to optimistically avoid unmarshalling the JSON if we don't need to look at the event metadata or fields later.

One big problem with avoiding unmarshalling "Fields" field. The data we need to run processors, even a drop processor or adding a tag is in there. So I think it is unlikely that we can avoid unmarshalling the "Fields" field.

Is there any reason you tested with the fields as string instead of bytes? I would have used bytes so we don't require a UTF-8 encoding, but maybe there is a reason we want this. The UTF-8 requirement is the only difference between string and bytes.

brain fart. I've switched to bytes and updated results with some more encoders for comparison coming soon.

leehinman commented 1 year ago

Updated Results

Updated code at https://github.com/leehinman/elastic-agent-shipper-client/tree/benchmark_string_fields

Command: go test -bench=. -benchmem -benchtime 15s -timeout 30m run on Apple M1 Max, OS 13.3.1, go 1.17.9

Raw Results

goos: darwin
goarch: arm64
pkg: github.com/elastic/elastic-agent-shipper-client/benchmark
BenchmarkMarshalUnmarshal/SingleOriginalProtobuf-10                 3115       5667029 ns/op     1736485 B/op      45540 allocs/op
BenchmarkMarshalUnmarshal/SingleMapStrStdJSON-10                    5421       3299549 ns/op     1537192 B/op      33773 allocs/op
BenchmarkMarshalUnmarshal/SingleShallowProtobuf-10                111667        161508 ns/op      372850 B/op       1320 allocs/op
BenchmarkMarshalUnmarshal/SingleShallowProtobufFull-10              5642       3191448 ns/op     1901476 B/op      34984 allocs/op
BenchmarkMarshalUnmarshal/SingleShallowProtobufFullGoJSON-10                9440       1908256 ns/op     1479525 B/op      20594 allocs/op
BenchmarkMarshalUnmarshal/SingleDeepEventStdJSON-10                         5060       3573773 ns/op     1571119 B/op      34214 allocs/op
BenchmarkMarshalUnmarshal/SingleDeepEventGoJSON-10                          9351       1937137 ns/op     1162826 B/op      18823 allocs/op
BenchmarkMarshalUnmarshal/SingleDeepEventCBORL-10                           8058       2229902 ns/op     1088005 B/op      30253 allocs/op
BenchmarkMarshalUnmarshal/MultiOriginalProtobuf-10                          1044      17318237 ns/op     5402877 B/op     144140 allocs/op
BenchmarkMarshalUnmarshal/MultiMapStrStdJSON-10                             1710      10541866 ns/op     4809868 B/op     106568 allocs/op
BenchmarkMarshalUnmarshal/MultiShallowProtobuf-10                          35442        507303 ns/op     1115833 B/op       3720 allocs/op
BenchmarkMarshalUnmarshal/MultiShallowProtobufFull-10                       1622      11070076 ns/op     5921757 B/op     109993 allocs/op
BenchmarkMarshalUnmarshal/MultiShallowProtobufFullGoJSON-10                 2640        6799568 ns/op    4632009 B/op      64192 allocs/op
BenchmarkMarshalUnmarshal/MultiDeepEventStdJSON-10                          1614      11151869 ns/op     4904499 B/op     107802 allocs/op
BenchmarkMarshalUnmarshal/MultiDeepEventGoJSON-10                           2942       6134171 ns/op     3643867 B/op      59202 allocs/op
BenchmarkMarshalUnmarshal/MultiDeepEventCBORL-10                            2569       7019086 ns/op     3405332 B/op      96056 allocs/op
PASS
ok      github.com/elastic/elastic-agent-shipper-client/benchmark   484.142s

Comparing to original protobuf implementation, "Shallow Protobuf" is with Fields & Metadata as bytes. "Deep Event" is a go struct that mimics the original protobuf implementation (Fields & Metadata as mapstr, timestamp as string). "goccy" is using https://github.com/goccy/go-json. "CBORL" is using https://github.com/fxamacker/cbor

Condition Normalized to Original Protobuf (orig/condition) ns/op
Single Field, mapstr, stdlib json 1.72
Single Field, Shallow Protobuf 35.09
Single Field, Shallow Protobuf, stdlib json Fields, Meta 1.78
Single Field, Shallow Protobuf, goccy json Fields, Meta 2.97
Single Field, Deep Event, stdlib json 1.59
Single Field, Deep Event, goccy json 2.93
Single Field, Deep Event, CBORL 2.54
Multi Field, mapstr, stdlib json 1.64
Multi Field, Shallow Protobuf 34.14
Multi Field, Shallow Protobuf, stdlib json Fields, Meta 1.56
Multi Field, Shallow Protobuf, goccy json Fields, Meta 2.55
Multi Field, Deep Event, stdlib json 1.55
Multi Field, Deep Event, goccy json 2.82
Multi Field, Deep Event, CBORL 2.47

Observations

  1. The deep protobuf is worse that using mapstr and stdlib json encode/decode.
  2. The shallow protobuf is significantly faster (~35x)
  3. But once you add back decoding Fields & Metadata that drops to to 1.5x - 3x (depends on number and depth of fields)
  4. Using goccy json encoder provides similar benefits to shallow protobuf with full decode.
  5. CBORL is similar in speed to goccy, this might be useful if using the disk queue since it stores events in this format
cmacknz commented 1 year ago

CBORL is similar in speed to goccy, this might be useful if using the disk queue since it stores events in this format

One advantage to serializing to JSON for the shipper is users and teams that are already writing to Elasticsearch already heavily use JSON and have invested time in making it efficient. We need to consider existing code that already exists for this purpose. I don't like the idea of throwing away this existing efficiency when using the shipper.

There are multiple alternative JSON serialization libraries in use for efficiency and convenience reasons, and switching to CBORL would force everyone to pick a CBORL library and invest time in performance tuning it.

The disk queue isn't the most common use case, and I think it is generally accepted that writing to disk will be slower. It seems acceptable to pay the cost to serialize from JSON to CBORL there like we do in Beats today, and we generally don't get complaints about it.

cmacknz commented 1 year ago

The only other thing it may be worth investigating here is trying to use an alternative protobuf generator as described in https://github.com/elastic/elastic-agent-shipper/issues/263.

Specifically it could be worth quickly testing if using https://github.com/planetscale/vtprotobuf makes a difference.

leehinman commented 1 year ago

Results with vtprotobuf

Updated code at https://github.com/leehinman/elastic-agent-shipper-client/tree/vtprotobuf

Command: go test -bench=. -benchmem -benchtime 15s -timeout 30m run on Apple M1 Max, OS 13.3.1, go 1.17.9

Raw Results

goos: darwin
goarch: arm64
pkg: github.com/elastic/elastic-agent-shipper-client/benchmark
BenchmarkMarshalUnmarshal/SingleOriginalProtobuf-10                 2834       6243650 ns/op     1736471 B/op      45540 allocs/op
BenchmarkMarshalUnmarshal/SingleVTProtobuf-10                      10000       1633667 ns/op     1243770 B/op      27060 allocs/op
BenchmarkMarshalUnmarshal/SingleMapStrStdJSON-10                    5431       3296384 ns/op     1537780 B/op      33774 allocs/op
BenchmarkMarshalUnmarshal/SingleShallowProtobuf-10                112160        160555 ns/op      372850 B/op       1320 allocs/op
BenchmarkMarshalUnmarshal/SingleVTShallowProtobuf-10              144440        124353 ns/op      362290 B/op       1320 allocs/op
BenchmarkMarshalUnmarshal/SingleShallowProtobufFull-10              5661       3159657 ns/op     1901542 B/op      34984 allocs/op
BenchmarkMarshalUnmarshal/SingleVTShallowProtobufFull-10            5778       3100950 ns/op     1890367 B/op      34983 allocs/op
BenchmarkMarshalUnmarshal/SingleShallowProtobufFullGoJSON-10                9375       1907575 ns/op     1479526 B/op      20594 allocs/op
BenchmarkMarshalUnmarshal/SingleShallowVTProtobufFullGoJSON-10              9735       1845337 ns/op     1468934 B/op      20594 allocs/op
BenchmarkMarshalUnmarshal/SingleMapStrGoJSON-10                             9420       1917613 ns/op     1108078 B/op      19334 allocs/op
BenchmarkMarshalUnmarshal/SingleShallowEventStdJSON-10                     13436       1340617 ns/op      586345 B/op       1982 allocs/op
BenchmarkMarshalUnmarshal/SingleShallowEventGoJSON-10                      38572        467163 ns/op      652993 B/op        554 allocs/op
BenchmarkMarshalUnmarshal/SingleDeepEventStdJSON-10                         5029       3580827 ns/op     1571243 B/op      34214 allocs/op
BenchmarkMarshalUnmarshal/SingleDeepEventSructFormCBORL-10                  6268       2880505 ns/op     1968003 B/op      31083 allocs/op
BenchmarkMarshalUnmarshal/SingleDeepEventGoJSON-10                          9337       1949360 ns/op     1162966 B/op      18823 allocs/op
BenchmarkMarshalUnmarshal/SingleDeepEventCBORL-10                           8090       2230057 ns/op     1088256 B/op      30254 allocs/op
BenchmarkMarshalUnmarshal/MultiOriginalProtobuf-10                           933      19333760 ns/op     5402882 B/op     144140 allocs/op
BenchmarkMarshalUnmarshal/MultiVTProtobuf-10                                3607       5001397 ns/op     3840957 B/op      83660 allocs/op
BenchmarkMarshalUnmarshal/MultiMapStrStdJSON-10                             1708      10538655 ns/op     4810026 B/op     106568 allocs/op
BenchmarkMarshalUnmarshal/MultiShallowProtobuf-10                          35932        501989 ns/op     1115832 B/op       3720 allocs/op
BenchmarkMarshalUnmarshal/MultiVTShallowProtobuf-10                        47934        375503 ns/op     1086071 B/op       3720 allocs/op
BenchmarkMarshalUnmarshal/MultiShallowProtobufFull-10                       1634      11023805 ns/op     5920839 B/op     109992 allocs/op
BenchmarkMarshalUnmarshal/MultiVTShallowProtobufFull-10                     1663      10791995 ns/op     5891310 B/op     109992 allocs/op
BenchmarkMarshalUnmarshal/MultiShallowProtobufFullGoJSON-10                 2664       6739939 ns/op     4629145 B/op      64184 allocs/op
BenchmarkMarshalUnmarshal/MultiShallowVTProtobufFullGoJSON-10               2758       6540159 ns/op     4598962 B/op      64183 allocs/op
BenchmarkMarshalUnmarshal/MultiMapStrGoJSON-10                              2924       6153722 ns/op     3498135 B/op      60679 allocs/op
BenchmarkMarshalUnmarshal/MultiShallowEventStdJSON-10                       4365       4117105 ns/op     1750289 B/op       5591 allocs/op
BenchmarkMarshalUnmarshal/MultiShallowEventGoJSON-10                       12261       1467770 ns/op     1992242 B/op       1566 allocs/op
BenchmarkMarshalUnmarshal/MultiDeepEventStdJSON-10                          1602      11206129 ns/op     4904679 B/op     107802 allocs/op
BenchmarkMarshalUnmarshal/MultiDeepEventSructFormCBORL-10                   2084       8615209 ns/op     5840472 B/op      92738 allocs/op
BenchmarkMarshalUnmarshal/MultiDeepEventGoJSON-10                           2943       6140325 ns/op     3641569 B/op      59197 allocs/op
BenchmarkMarshalUnmarshal/MultiDeepEventCBORL-10                            2551       7022336 ns/op     3405496 B/op      96056 allocs/op
PASS
ok      github.com/elastic/elastic-agent-shipper-client/benchmark   634.441s

Comparison against original protobuf implementation

Condition Normalized to Original Protobuf (orig/condition) ns/op
Single Field, mapstr, stdlib json 1.89
Single Field, vtprotbuf 3.82
Single Field, Shallow Protobuf, goccy json Fields, Meta 3.27
Single Field, Shallow Protobuf, vtprotobufgoccy json Fields, Meta 3.38
Single Field, Deep Event, goccy json 3.20
Multi Field, mapstr, stdlib json 1.83
Multi Field, vtprotbuf 3.87
Multi Field, Shallow Protobuf, goccy json Fields, Meta 2.87
Multi Field, Shallow Protobuf, vtprotobufgoccy json Fields, Meta 2.96
Multi Field, Deep Event, goccy json 3.15

Observations

  1. vtprotobuf gives us the best speed with access to all the fields in Fields and Meta
cmacknz commented 1 year ago

We spoke about this in the shipper project team meeting, and the vtprotobuf implementation is so much faster that it seems obvious that we should use it. It is however not compatible with messages serialized using the default protobuf compiler: https://github.com/planetscale/vtprotobuf#mixing-protobuf-implementations-with-grpc

One way to take advantage of the vtprotobuf optimizations for Go code that can use it is to keep the existing protobuf message definitions, but also support the "wrapped JSON" message proposed in the issue description for clients that can't support an optimizied protobuf implementation. https://github.com/elastic/elastic-agent-shipper/issues/288#issue-1642528946

This would allow us to use the fastest implementation for the majority of the shipper clients which will be Beats or other Go processes but gives us a fallback for clients like endpoint security that wouldn't be compatible with the optimized vtprotobuf RPC. There was general support of this approach.

@leehinman one thing I'm not sure we considered is that in the real system Beats will be converting from beat.Event -> protobuf and then from protobuf -> beat.Event again on the server side.

Should we benchmark the performance of these beat.Event conversions specifically? Or are you confident that you benchmark accurately represents the performance we'll see already?

leehinman commented 1 year ago

@leehinman one thing I'm not sure we considered is that in the real system Beats will be converting from beat.Event -> protobuf and then from protobuf -> beat.Event again on the server side.

Should we benchmark the performance of these beat.Event conversions specifically? Or are you confident that you benchmark accurately represents the performance we'll see already?

The tests include Marshal & Unmarshal so messages.Event -> Marshal -> UnMarshal -> messages.Event:

func rtMessagesEvent(m *messages.Event) {
    b, err := proto.Marshal(m)
    if err != nil {
        panic(err)
    }
    new := messages.Event{}
    err = proto.Unmarshal(b, &new)
    if err != nil {
        panic(err)
    }
}

We could test beat.Event -> messages.Event -> Marshal -> UnMarshal -> messages.Event -> beat.Event, but I think the largest time is spent Marshal/Unmarshal.

cmacknz commented 1 year ago

The conversion from beat.Event to messages.Event is fairly complex so I think it is worth testing:

https://github.com/elastic/beats/blob/c30bf431ccb7988ddfb7e4244e041cce588c6b91/libbeat/outputs/shipper/shipper.go#L401

The conversion of the mapstrs to messages.Struct might be quite costly.

leehinman commented 1 year ago

Results adding beat.Event conversion

Updated code at https://github.com/leehinman/elastic-agent-shipper-client/tree/vtprotobuf

New test case is called BeatVTmessageEvent

Raw Results

% go test -bench=. -benchmem -benchtime 15s -timeout 30m
goos: darwin
goarch: arm64
pkg: github.com/elastic/elastic-agent-shipper-client/benchmark
BenchmarkMarshalUnmarshal/SingleOriginalProtobuf-10                 4322       4049273 ns/op     1736506 B/op      43560 allocs/op
BenchmarkMarshalUnmarshal/SingleVTProtobuf-10                      13071       1372157 ns/op     1243713 B/op      27060 allocs/op
BenchmarkMarshalUnmarshal/SingleMapStrGoJSON-10                    10000       1562388 ns/op     1106850 B/op      19332 allocs/op
BenchmarkMarshalUnmarshal/SingleBeatVTmessageEvent-10               6410       2809052 ns/op     2827978 B/op      57088 allocs/op
BenchmarkMarshalUnmarshal/MultiOriginalProtobuf-10                  1418      12723039 ns/op     5403007 B/op     138160 allocs/op
BenchmarkMarshalUnmarshal/MultiVTProtobuf-10                        4140       4368486 ns/op     3840886 B/op      83660 allocs/op
BenchmarkMarshalUnmarshal/MultiMapStrGoJSON-10                      3626       4912205 ns/op     3499173 B/op      60677 allocs/op
BenchmarkMarshalUnmarshal/MultiBeatVTmessageEvent-10                2091       8558858 ns/op     8751094 B/op     174991 allocs/op
PASS
ok      github.com/elastic/elastic-agent-shipper-client/benchmark   159.178s

Observations

  1. Adding the beat.Event conversions was pretty close to doubling ns/op (VTProtobuf vs BeatVTmessageEvent)
  2. Adding the beat.Event conversions was pretty close to doubling the allocs/op (VTProtobuf vs BeatVTmessageEvent)
  3. Even with beat.Event conversion vtprotobuf is still faster than the original protobuf implementation

Follow up questions

  1. Is the messages.Event struct bringing significant value over the beat.Event struct to warrant the extra conversions

Roundtrip code is copy of what is currently done in beats shipper client and server

func rtBeatsVTMessagesEvent(e *beat.Event) {
    meta, err := helpers.NewValue(e.Meta)
    if err != nil {
        panic(err)
    }
    fields, err := helpers.NewValue(e.Fields)
    if err != nil {
        panic(err)
    }
    source := &messages.Source{}
    ds := &messages.DataStream{}
    inputIDVal, err := e.Meta.GetValue("input_id")
    if err != nil {
        panic(err)
    }
    source.InputId, _ = inputIDVal.(string)

    streamIDVal, err := e.Meta.GetValue("stream_id")
    if err != nil {
        panic(err)
    }
    source.StreamId, _ = streamIDVal.(string)

    dsType, err := e.Fields.GetValue("data_stream.type")
    if err != nil {
        panic(err)
    }
    ds.Type, _ = dsType.(string)

    dsNamespace, err := e.Fields.GetValue("data_stream.namespace")
    if err != nil {
        panic(err)
    }
    ds.Namespace, _ = dsNamespace.(string)

    dsDataset, err := e.Fields.GetValue("data_stream.dataset")
    if err != nil {
        panic(err)
    }
    ds.Dataset, _ = dsDataset.(string)
    m := &messages.Event{
        Timestamp:  timestamppb.New(e.Timestamp),
        Metadata:   meta.GetStructValue(),
        Fields:     fields.GetStructValue(),
        Source:     source,
        DataStream: ds,
    }

    b, err := m.MarshalVT()
    if err != nil {
        panic(err)
    }
    newMessage := &messages.Event{}
    err = newMessage.UnmarshalVT(b)
    if err != nil {
        panic(err)
    }
    be := &beat.Event{}
    be.Timestamp = newMessage.Timestamp.AsTime()
    be.Fields = helpers.AsMap(newMessage.Fields)
    be.Meta = helpers.AsMap(newMessage.Metadata)
}
cmacknz commented 1 year ago

Is the messages.Event struct bringing significant value over the beat.Event struct to warrant the extra conversions

Probably not. messages.Event was proposed before we realized that reinventing the Beat event pipeline in a standalone shipper. Now that the shipper itself will be a Beat, we could define a message type that more closely maps to beat.Event and minimizes the overhead.

cmacknz commented 1 year ago

We could also consider just serializing beat.Event to JSON and wrapping it in protobuf to be able to pull the metadata out easily.

cmacknz commented 1 year ago

Spoke about this today, and decided the shipper should just accept the same JSON that an input would be writing to ES without the shipper.

This would give us an event format like the following:

message Event {
 // JSON serialized event to be written to ES.
 oneof event {
   bytes json = 1;
 }
}

This is using a oneof type because we made find or decide that we could benefit from a more specialized protobuf event in the future (particularly given the results above showing vtprotobuf serialization is faster than JSON), but we can start from where we are today.

We will have to document the structure of the JSON event in the comments, for example the data stream fields are mandatory for the shipper to be able to construct the index name and apply processors.

jlind23 commented 1 year ago

@leehinman @cmacknz now that we have measured the performances differences: https://github.com/elastic/elastic-agent-shipper/issues/288#issuecomment-1546226959 Can we close this issue as done?

brian-mckinney commented 1 year ago

Spoke about this today, and decided the shipper should just accept the same JSON that an input would be writing to ES without the shipper.

Is there an active PR for this change?

cmacknz commented 1 year ago

There's no open PR yet, we still need to follow through and make the change in https://github.com/elastic/elastic-agent-shipper/issues/288#issuecomment-1551730615.

@jlind23 we can keep this open but move it to a future sprint. I will edit the description with the latest information. I don't see a point in opening a separate implementation issue since this one has all the history already.