Closed constanca-m closed 2 weeks ago
I want to see if there is any significant change in performance between our latest released version, lambda-v1.14.0, and the ESF created from this PR.
To check this, I am going to trigger ESF based on messages coming to a cloudwatch logs group. This group is receiving some X number of events, then stops for 30s, and then the cycle repeats. The number of events doubles every 20 cycles. I let it run 100 cycles. Summarizing:
Number of events sent + sleep 30s | Cycles |
---|---|
50 | 20 |
100 | 20 |
200 | 20 |
400 | 20 |
800 | 20 |
The message being sent is small sized, only 94b.
I also want to check how long it takes to get a message from the cloudwatch groups to elasticsearch. Since ESF does not add to a document to ingest timestamp, I am going to add a default ingest pipeline and add that field in every document.
I deployed ESF and updated the lambda 3 times:
elasticsearch
outputelasticsearch
outputelasticsearch
outputsSo the first line in my graphics corresponds to first run, second line to second run, and third line to third run. The cloudwatch metrics for the ESF lambda function are these:
There is very little difference between the 3 ESFs and their configurations. For the 2 outputs configurations, the average duration slightly increased. This is to be expected, as now it needs to iterate over two outputs.
I also computed the average latency using the field my ingest pipeline added. I am calculating the latency this way:
last_value(received)-last_value(@timestamp)
.30s
. I am only using one document (the last one) for each interval.The latency for 1 output (the one that is common for all three runs) is like this:
The latency for this output barely changed between my 3 runs.
The second output (third run) has a bigger latency:
This is to be expected as well. We can see that the minimum average latency is already past the latency of the first output. From this we can conclude that ESF sent the data to the first output first, and then it sent the data to this second output. They did not happen in parallel.
My conclusions is that there is no significant change or decay in performance. However, the more outputs the user adds, the bigger the latency gets for each output.
What does this PR do?
Each input can now have multiple outputs of the same
type
. It cannot, however, have the same output specified more than once for each input - that is, it cannot have twoelasticsearch
outputs with the same destination. See section Results for examples.Why is it important?
See details on https://github.com/elastic/elastic-serverless-forwarder/issues/721.
Checklist
CHANGELOG.md
How to test this PR locally
Refer to https://github.com/elastic/elastic-serverless-forwarder/tree/main/how-to-test-locally.
Related issues
Relates to https://github.com/elastic/elastic-serverless-forwarder/issues/721
Results
Example 1: Trying 2
elasticsearch
outputsI have an input with two
elasticsearch
outputs:I sent a log event from this input.
If I look at Discover in both clouds, I can see that both outputs got my message:
https://terraform-8b3bac.es.eu-central-1.aws.cloud.es.io:![Screenshot from 2024-05-29 14-19-22](https://github.com/elastic/elastic-serverless-forwarder/assets/113898685/149dce2d-7ce2-4aaa-bf14-c73e1e0b2537)
https://terraform-2.es.europe-west4.gcp.elastic-cloud.com:![Screenshot from 2024-05-29 14-19-51](https://github.com/elastic/elastic-serverless-forwarder/assets/113898685/0d6238f9-28bb-4365-b5a4-2b549ffb8d8e)
Example 2: Trying 2
elasticsearch
outputs with the same destinationI have an input with two
elasticsearch
outputs that are the same:This case fails, since we cannot have duplicated outputs (that is, output with the same destination for the same input):
Example 3: Checking message body in replay queue
The message body caused by an ingestion error should contain the
output_destination
(instead ofoutput_type
like before). I am causing the placement of the message in the replay queue by using wrong authentication:Message in the replay queue: