elastic / elasticsearch

Free and Open, Distributed, RESTful Search Engine
https://www.elastic.co/products/elasticsearch
Other
68.55k stars 24.35k forks source link

Replace apm-java-agent with OTEL SDK #109335

Open pgomulka opened 1 month ago

pgomulka commented 1 month ago

Description

elasticsearch is using apm-java-agent as the underlying implementation in the apm module.we are using our own apm api, implemented in apm-module with OTEL api. This should not change. What should change is the binding between otel api and the implementation. Which should be otel sdk. Otel SDK will allow us to get more flexibility on configuring how our metrics and traces are sent to apm server (apm server support otel sdk). With Otel sdk we will be able to implement features like 'tee-ing' (splitting to two apm server) of the export or some additional buffering, retries when apm server is overloaded.

I worked on a simple very dirty PoC where this proves to work https://github.com/elastic/elasticsearch/pull/110263 Things that need more investigation and work:

elasticsearchmachine commented 1 month ago

Pinging @elastic/es-core-infra (Team:Core/Infra)

pgomulka commented 5 days ago

additional features to consider:

  1. being able to set custom interval for certain metrics
    • it is possible to create multiple MetricReaders(those trigger exporting) at different interval. So therefore we could maybe have some custom filtering what metrics are read by what MetricsReader (and thus exported at different intervals). does not seem trivial though.
  2. change metric interval dynamically
    • it is possible to provide a custom java.util.concurency.Scheduler. so if only we implement a logic that cancel's previous scheduled task and submit a new one with different interval it should be doable.
  3. make sure metrics are sent upon node shutdown
    • there is a force flush mechanism. I am not sure if it is possible to set a timeout on it though