Closed ChrsMark closed 2 days ago
@strawgate @AlexanderWert What is the priority for this issue? Which release do we want to target it to?
We can evaluate the list of the k8s-distro components and consider including them as well in the Elastic distro
Looking at the list of components it seems like there are still many components that sound less relevant for our distro (like spanmetricsconnector, servicegraphconnector, etc.). As you wrote, I think we need to evaluate and come up with our own list inspired by that list.
@strawgate @AlexanderWert What is the priority for this issue? Which release do we want to target it to?
I think 8.15.2 would be reasonable for that, since that's when we plan to have a broader k8s integration for OTel.
Looking at the list of components it seems like there are still many components that sound less relevant for our distro (like spanmetricsconnector, servicegraphconnector, etc.). As you wrote, I think we need to evaluate and come up with our own list inspired by that list.
What would the criteria be for deciding whether to include a component from the list in our distro vs. not include it?
What would the criteria be for deciding whether to include a component from the list in our distro vs. not include it?
I would consider as required those that are defined by default in the upstream's Helm Charts: https://github.com/open-telemetry/opentelemetry-helm-charts/blob/main/charts/opentelemetry-collector/values.yaml
With this users that already use the upstream charts could switch to the Elastic distro by just defining the image properly:
helm install my-opentelemetry-collector open-telemetry/opentelemetry-collector --set mode=daemonset --set image.repository="docker.elastic.co/beats/elastic-agent" --set image.tag="8.15.0-SNAPSHOT" --set command.name="/usr/share/elastic-agent/elastic-agent" --set command.extraArgs="{otel, -c, --config=etc/elastic-agent/otel.yaml}"
The above gives me the following in the failed Collector's Pod logs:
Starting in otel mode
1 error occurred:
* failed to get config: cannot unmarshal the configuration: decoding failed due to the following error(s):
error decoding 'receivers': unknown type: "prometheus" for id: "prometheus" (valid values: [otlp filelog kubeletstats k8s_cluster hostmetrics httpcheck])
error decoding 'processors': unknown type: "memory_limiter" for id: "memory_limiter" (valid values: [transform filter k8sattributes elasticinframetrics resourcedetection batch resource attributes])
error decoding 'extensions': unknown type: "health_check" for id: "health_check" (valid values: [file_storage memory_limiter])
A working PoC: https://github.com/elastic/opentelemetry-dev/pull/331
As defined in the collector best practices docs, the memory_limiter
processor should also be included to efficiently manage the resource pipelines: https://github.com/open-telemetry/opentelemetry-collector/blob/main/docs/security-best-practices.md#safeguards-around-resource-utilization
As defined in the collector best practices docs, the
memory_limiter
processor should also be included to efficiently manage the resource pipelines: https://github.com/open-telemetry/opentelemetry-collector/blob/main/docs/security-best-practices.md#safeguards-around-resource-utilization
~In https://github.com/elastic/elastic-agent/pull/4689, we explicitly chose the memory_limiter
extension over the processor because the extension doesn't drop events but instead puts backpressure on the receivers. Do you think we should revert that decision and go with the processor again? cc: @michalpristas~
[UPDATE] Chatting with @cmacknz off-issue, there's no reason not to include both the extension and the processor, if we need the processor for our collector distro to work on k8s, so will keep the memory_limiter
processor in the list of additional components.
@ycombinator I was checking the extension and from the docs it seems that in the future it should be used in favor of the processor:
The memory limiter extension is used to prevent out of memory situations on the collector. The extension will potentially replace the Memory Limiter Processor.
But the extension is currently nop:
The extension is under development and does nothing.
I suggested adding the processor because the upstream Helm Chart requires it. Opening an issue to remove the dependency in Helm https://github.com/open-telemetry/opentelemetry-helm-charts/issues/1272
let's also monitor size of binary, difference between core collector
and full contrib collector
is a bit over 300MB
@cmacknz do we find this concerning?
Including the collector has to make agent larger, and we need more functionality than what the core collector has, so this is really unavoidable. Be mindful of what components we include to make sure they have a purpose (e.g. we probably don't need everything in contrib) but there's not much else we can do in the short term.
From the Helm deployment perspective, only the health_check
extension is required in the values configuration. Other components (jaeger, zipkin, etc.) can be removed from the sample configuration by setting their value to null
: https://github.com/open-telemetry/opentelemetry-helm-charts/tree/main/charts/opentelemetry-collector#basic-top-level-configuration
Meanwhile, the health_check
requirement can also be skipped in the Helm deployment by using a custom configmap (demo example)
FYI a working example can be found at https://github.com/elastic/opentelemetry-dev/pull/331, which uses https://github.com/elastic/elastic-agent/pull/5126.
Describe the enhancement:
K8s is one of the most used envs according to https://opentelemetry.io/blog/2024/otel-collector-survey/#deployment-scale-and-environment, hence it makes sense to consult what the community distro for k8s includes. We can evaluate the list of the k8s-distro components and consider including them as well in the Elastic distro: https://github.com/open-telemetry/opentelemetry-collector-releases/blob/main/distributions/otelcol-k8s/manifest.yaml
/cc @elastic/otel-devs
Describe a specific use case for the enhancement or feature:
What is the definition of done?
Taken from https://github.com/elastic/elastic-agent/issues/5092#issuecomment-2225517430:
prometheus
receivermemory_limiter
processorhealth_check
extensionheml install ...
command mentioned in https://github.com/elastic/elastic-agent/issues/5092#issuecomment-2225517430 succeds.