Migration from v0.88.0-eksbuild calcifications

lifeofmoo commented 8 months ago

Hello,

Trying to resolve x-ray traces from eks after Adot Migration

eks version: v1.27
Previous ADOT: v0.88.0-eksbuild.2
New ADOT: v0.94.1-eksbuild.1

I've had OTEL running well for nearly a year in our eks clusters - all was well until I tried the migration!

The old version docs said to do the following (high level):

Create a IAM role and a k8s ServiceAccount called adot-collector and then install the add-on with the IAM attached.
Deploy the x-ray collector receiver - similar to this

The migration docs say the IAM roles accounts should now be split into 2 roles, which is fine. However the command to create both the new roles, in my case the otel ingest (adot-col-otlp-ingest) is to pass the --role-only flag. i.e not create the k8s ServiceAccount.

By default the service account will be created or updated to include the role annotation, this can be disabled using the flag --role-only.

The docs also go on to say:

This IAM role generated by the above command needs to be inserted into the annotations field of the advanced configuration as seen below:

I completely uninstall the old adot add-on and install the latest version using this command and config json file in order to annotate the install.

aws eks create-addon \
    --cluster-name eksdev \
    --addon-name adot \
    --configuration-values file://configuration-values.json \
    --resolve-conflicts=OVERWRITE

{
  "collector": {
      "otlpIngest": {
          "serviceAccount": {
              "annotations": {
                  "eks.amazonaws.com/role-arn": "arn:aws:iam::123456789101:role/adot-col-otlp-ingest"
              }
          }
      }
  }
}

I'm assuming that I've done the above correctly. So when I proceed to re-apply same x-ray collector receiver. All I've done now is commented out the Service Account line and apply this is a namespace called opentelemetry (as before).

 apiVersion: opentelemetry.io/v1alpha1
kind: OpenTelemetryCollector
metadata:
  name: collector-xray
spec:
  mode: deployment
  resources:
    requests:
      cpu: "1"
    limits:
      cpu: "1"
#  serviceAccount: adot-collector
  config: |
    receivers:
      otlp:
        protocols:
          grpc:
            endpoint: 0.0.0.0:4317
          http:
            endpoint: 0.0.0.0:4318
    processors:
      batch/traces:
        timeout: 1s
        send_batch_size: 50
      resourcedetection/eks:
        detectors: [env, eks]
        timeout: 2s
        override: false

    exporters:
      awsxray:
        region: eu-west-1
        index_all_attributes: true

      logging:
        loglevel: debug
    service:
      pipelines:
        traces:
          receivers: [otlp]
          processors: [resourcedetection/eks, batch/traces]
          exporters: [awsxray,logging]

      telemetry:
        logs:
          level: debug

However I fail to see existing services which were happily reporting before.

I have seen this but I can't make sense of it.

Even if I deploy the sample app and traffic generator pointing to adot-col-otlp-ingest-collector, I don't see anything. sample-app.txt traffic-generator.txt

lifeofmoo commented 8 months ago

This is very telling!

If I tail the logs of the collector receiver pod which is deployed in the opentelemetry NS I see this.

2024-04-02T09:22:05.348Z warn batchprocessor@v0.94.1/batch_processor.go:258 Sender failed {"kind": "processor", "name": "batch/traces", "pipeline": "traces", "error": "Permanent error: AccessDeniedException: User: arn:aws:sts::12345678910:assumed-role/KarpenterNodeRole-eksdev/i-0741c2f458623f36d is not authorized to perform: xray:PutTraceSegments because no identity-based policy allows the xray:PutTraceSegments action\n\tstatus code: 403, request id: 5fea62ce-68d8-4fc9-af83-32ac7bd93366"}

This used to work prior to the migration BECAUSE the adot-operator IAM role and k8s ServiceAccount was created and then referenced in the collector deployment. However the latest docs saying that the IAM role (adot-col-otlp-ingest) is supposed to be ROLE ONLY (i,e do not create the associated k8s account). Which i've already highlighted above.

bryan-aguilar commented 8 months ago

Are you deploying your own OpenTelemetryCollector custom resource or using the preconfigured adot-otlp-ingest collector? Can you share your advanced configuration?

If you are deploying your own OpenTelemetryCollector and not using the preconfigured one then you need to create the service account also. The migration guide is only for users who were using the preconfigured collector deployments available through the advanced configuration.

lifeofmoo commented 8 months ago

Hi @bryan-aguilar, good to hear from you again!

I get what your saying about having to create the SA for my use-case as that was my hunch a long.

I originally approached this migration as a fresh install.

Created a the new adot-col-otlp-ingest role (without an SA).
installed the adot add-on with the config value on the IAM role arn
Did not deploy my custom OpenTelemetryCollector
Expected things to work

However when that didn't work I also applied my own OpenTelemetryCollector in the opentelemery namespace. You can see the OpenTelemetryCollector in the original post. All I've done is commented out the ServiceAccount as this doesn't exist at the moment (following the fresh install approach).

What would it take to get this working as a fresh install (without my customer OpenTelemetryCollector), because I believe I am passing the IAM role correctly in the json file during the adot installation.

bryan-aguilar commented 8 months ago

Could you share your v0.88.0 ADOT EKS Add-on advanced configuration? That would give me more insight into what is required for the migration.

lifeofmoo commented 8 months ago

yep, it's all in the original post.

I completely uninstalled the old adot add-on and install the latest version using this command and config json file in order to annotate the install.

aws eks create-addon \
    --cluster-name eksdev \
    --addon-name adot \
    --configuration-values file://configuration-values.json \
    --resolve-conflicts=OVERWRITE

The below is the contents of the configuration-values.json file.

{
  "collector": {
      "otlpIngest": {
          "serviceAccount": {
              "annotations": {
                  "eks.amazonaws.com/role-arn": "arn:aws:iam::123456789101:role/adot-col-otlp-ingest"
              }
          }
      }
  }
}

bryan-aguilar commented 8 months ago

Ahh, I apologize, can I see the advanced configuration you used for before trying to migrate to v0.88.0?

lifeofmoo commented 8 months ago

These were the full steps for me getting this to work before v0.88.0.

The original IAM role was created via eksctl for the opentelemetry NS

eksctl create iamserviceaccount \
    --name adot-collector \
    --namespace opentelemetry \
    --cluster eksdev \
    --attach-policy-arn arn:aws:iam::aws:policy/AmazonPrometheusRemoteWriteAccess \
    --attach-policy-arn arn:aws:iam::aws:policy/AWSXrayWriteOnlyAccess \
    --attach-policy-arn arn:aws:iam::aws:policy/CloudWatchAgentServerPolicy \
    --approve \
    --override-existing-serviceaccounts

The Add-on was installed using this command - I didn't have to apply any advanced config for the install.

aws eks create-addon --addon-name adot --cluster-name eksstg -addon-version v0.82.0-eksbuild.1 --service-account-role-arn arn:aws:iam::12345678910:role/eksctl-eksdev-addon-iamserviceaccount-opente-Role1-xxxxxxxx --resolve-conflicts Overwrite

kubectl get all -n opentelemetry-operator-system
NAME                                          READY   STATUS    RESTARTS   AGE
pod/opentelemetry-operator-5988dc7cd5-26p5x   2/2     Running   0          5m49s

NAME                                     TYPE        CLUSTER-IP      EXTERNAL-IP   PORT(S)             AGE
service/opentelemetry-operator           ClusterIP   10.100.247.55   <none>        8443/TCP,8080/TCP   5m51s
service/opentelemetry-operator-webhook   ClusterIP   10.100.216.94   <none>        443/TCP             5m51s

NAME                                     READY   UP-TO-DATE   AVAILABLE   AGE
deployment.apps/opentelemetry-operator   1/1     1            1           5m51s

The original adot-collector.yaml has the serviceAccount: adot-collector uncommented.

kubectl apply -f adot-collector.yaml -n opentelemetry

kubectl get all -n opentelemetry
NAME                                            READY   STATUS    RESTARTS   AGE
pod/collector-xray-collector-6555869687-r4qbz   1/1     Running   0          58s

NAME                                          TYPE        CLUSTER-IP       EXTERNAL-IP   PORT(S)             AGE
service/collector-xray-collector              ClusterIP   10.100.239.143   <none>        4317/TCP,4318/TCP   58s
service/collector-xray-collector-headless     ClusterIP   None             <none>        4317/TCP,4318/TCP   58s
service/collector-xray-collector-monitoring   ClusterIP   10.100.35.133    <none>        8888/TCP            58s

NAME                                       READY   UP-TO-DATE   AVAILABLE   AGE
deployment.apps/collector-xray-collector   1/1     1            1           58s

NAME                                                  DESIRED   CURRENT   READY   AGE
replicaset.apps/collector-xray-collector-6555869687   1         1         1       58s

To recap - the Adot add in installed in the opentelemetry-operator-system NS and the collector was applied in the opentelemetry NS. This worked perfectly.

bryan-aguilar commented 8 months ago

Since you were not using the advanced configuration the migration guide does not apply to you. You should be able to follow the same steps listed above and receive the same results for versions >= v0.88.0.

bryan-aguilar commented 8 months ago

The roles referenced in the migration guide are for users who are using the ADOT EKS add-ons preconfigured collector deployments such as OTLP Ingest ADOT Collector. In your case you are deploying and managing your own OpenTelemetry Collector custom resource so you will have to manage the service account also.

lifeofmoo commented 8 months ago

Ok, i'll give that a go tomorrow.

lifeofmoo commented 7 months ago

I'm still struggling to get this to work when using tthe auto instrumentation docs - which also used to work before the migration.

I've re-created the IAM role with a serviceAccount - deployed my Customer collector and and now references this ServiceAccount following but apps which have been auto instrumented are not appearing in x-ray.

apiVersion: opentelemetry.io/v1alpha1
kind: Instrumentation
metadata:
  name: learning-auth-api-instrumentation
spec:
  exporter:
    endpoint: http://collector-xray-collector.opentelemetry:4317
  java:
    image: public.ecr.aws/aws-observability/adot-autoinstrumentation-java:v1.32.1

apiVersion: apps/v1
kind: Deployment
metadata:
  # namespace: nextjs
  name: learning-auth-api
spec:
  template:
    spec:
      containers:
        - name: learning-auth-api
          env:
            - name: AWS_REGION
              value: eu-west-1
            - name: CLUSTER_NAME
              value: eksdev
            - name: LISTEN_ADDRESS
              value: 0.0.0.0:8080
            - name: OTEL_EXPORTER_OTLP_ENDPOINT
              value: http://collector-xray-collector.opentelemetry:4317
            - name: OTEL_RESOURCE_ATTRIBUTES
              value: service.namespace=learning,service.name=learning-auth-api
            - name: OTEL_SERVICE_NAME
              value: learning-auth-api
            - name: OTEL_TRACES_EXPORTER
              value: otlp
            - name: OTEL_METRICS_EXPORTER
              value: none

lifeofmoo commented 7 months ago

This is what I ran:

eksctl create iamserviceaccount \
    --name adot-col-otlp-ingest \
    --namespace opentelemetry \
    --role-name adot-col-otlp-ingest \
    --cluster eksdev \
    --attach-policy-arn arn:aws:iam::aws:policy/AWSXrayWriteOnlyAccess \
    --tags CostCentre=operations \
    --approve

k get sa -n opentelemetry
NAME                   SECRETS   AGE
adot-col-otlp-ingest   0         16m

k describe sa -n opentelemetry adot-col-otlp-ingest 
Name:                adot-col-otlp-ingest
Namespace:           opentelemetry
Labels:              app.kubernetes.io/managed-by=eksctl
Annotations:         eks.amazonaws.com/role-arn: arn:aws:iam::12345678910:role/adot-col-otlp-ingest

ADOT is still installed in the opentelemetry-operator-system NS

My customer receiver (below) is deployed in the opentelemetry NS

apiVersion: opentelemetry.io/v1alpha1
kind: OpenTelemetryCollector
metadata:
  name: collector-xray
spec:
  mode: deployment
  resources:
    requests:
      cpu: "1"
    limits:
      cpu: "1"
  serviceAccount: adot-col-otlp-ingest
  config: |
    receivers:
      otlp:
        protocols:
          grpc:
            endpoint: 0.0.0.0:4317
          http:
            endpoint: 0.0.0.0:4318
    processors:
      batch/traces:
        timeout: 1s
        send_batch_size: 50
      resourcedetection/eks:
        detectors: [env, eks]
        timeout: 2s
        override: false

    exporters:
      awsxray:
        region: eu-west-1
        index_all_attributes: true

      logging:
        loglevel: debug
    service:
      pipelines:
        traces:
          receivers: [otlp]
          processors: [resourcedetection/eks, batch/traces]
          exporters: [awsxray]

      telemetry:
        logs:
          level: debug

lifeofmoo commented 7 months ago

Going right back to the beginning of this ticket. I was more than happy to do a complete fresh install and go via the recommended route - I keep reviewing the docs and it feels like my previous setup is deprecated and not worth hanging onto.

However even a complete fresh install didn't work. So am i'm at a bit of a loss now.

bryan-aguilar commented 7 months ago

Can you double check to make sure everything is being installed into the namespace you intend them to? For example, I see in the above examples that you have installed the service account and OpenTelemetryCollector into different namespaces.

This may be a copy/paste ommission but just need to make sure.

If everything is indeed installed in the correct namespaces then I would start by looking at the instrumented application logs, and collector logs. If the neither of them have errors, then I would suggest adding the logging exporter to your trace pipeline to verify that the collector is received trace data.

lifeofmoo commented 7 months ago

The different Namespaces was a deliberate thing. I've re-created the adot and IAM/SA role in the opentelemetry-operator-system.

I changed my Java app from my old exporter endpoint to as per this.

http://adot-col-otlp-ingest-collector:4317 The old endpoint as per my custom collector was: http://collector-xray-collector.opentelemetry:4317

note the original customer endoint has an additional opentelemetry domain in the url.

This is how my java app is being auto instrumented.

These are the logs I see in the pod.

 {"level":"info","ts":"2024-04-09T08:23:00Z","msg":"Skipping pod instrumentation - already instrumented","namespace":"learning","name":""}
{"level":"info","ts":"2024-04-09T08:23:02Z","msg":"Skipping pod instrumentation - already instrumented","namespace":"learning","name":""}

lifeofmoo commented 7 months ago

I see this error when I redeploy the same app and traffic generator

[otel.javaagent 2024-04-09 08:54:57:063 +0000] [OkHttp http://adot-col-otlp-ingest-collector:4317/...] ERROR io.opentelemetry.exporter.internal.grpc.OkHttpGrpcExporter - Failed to export metrics. The request could not be executed. Full error message: adot-col-otlp-ingest-collector

lifeofmoo commented 7 months ago

should I be worried that the last activity used hasn't registered yet?

I've deployed to add to make sure xray pipelines are enabled.

adot-with-custom-config iam-not-being-used

lifeofmoo commented 7 months ago

kubectl get all -n opentelemetry-operator-system
NAME                                                  READY   STATUS    RESTARTS   AGE
pod/adot-col-otlp-ingest-collector-7d76b567ff-msc2h   1/1     Running   0          119m
pod/opentelemetry-operator-85d8596db5-hwbdq           2/2     Running   0          122m

NAME                                                TYPE        CLUSTER-IP      EXTERNAL-IP   PORT(S)             AGE
service/adot-col-otlp-ingest-collector              ClusterIP   10.100.60.53    <none>        4317/TCP,4318/TCP   119m
service/adot-col-otlp-ingest-collector-headless     ClusterIP   None            <none>        4317/TCP,4318/TCP   119m
service/adot-col-otlp-ingest-collector-monitoring   ClusterIP   10.100.204.39   <none>        8888/TCP            119m
service/opentelemetry-operator                      ClusterIP   10.100.228.26   <none>        8443/TCP,8080/TCP   122m
service/opentelemetry-operator-webhook              ClusterIP   10.100.7.175    <none>        443/TCP             122m

NAME                                             READY   UP-TO-DATE   AVAILABLE   AGE
deployment.apps/adot-col-otlp-ingest-collector   1/1     1            1           119m
deployment.apps/opentelemetry-operator           1/1     1            1           122m

NAME                                                        DESIRED   CURRENT   READY   AGE
replicaset.apps/adot-col-otlp-ingest-collector-7d76b567ff   1         1         1       119m
replicaset.apps/opentelemetry-operator-85d8596db5           1         1         1       122m

k logs pod/adot-col-otlp-ingest-collector-7d76b567ff-msc2h
2024/04/09 09:11:07 ADOT Collector version: v0.38.1
2024/04/09 09:11:07 found no extra config, skip it, err: open /opt/aws/aws-otel-collector/etc/extracfg.txt: no such file or directory
2024-04-09T09:11:07.288Z    info    service@v0.94.1/telemetry.go:59 Setting up own telemetry...
2024-04-09T09:11:07.288Z    info    service@v0.94.1/telemetry.go:104    Serving metrics {"address": ":8888", "level": "Basic"}
2024-04-09T09:11:07.290Z    info    service@v0.94.1/service.go:140  Starting aws-otel-collector...  {"Version": "v0.38.1", "NumCPU": 8}
2024-04-09T09:11:07.290Z    info    extensions/extensions.go:34 Starting extensions...
2024-04-09T09:11:07.290Z    warn    internal@v0.94.1/warning.go:42  Using the 0.0.0.0 address exposes this server to every network interface, which may facilitate Denial of Service attacks. Enable the feature gate to change the default and remove this warning.    {"kind": "receiver", "name": "otlp", "data_type": "traces", "documentation": "https://github.com/open-telemetry/opentelemetry-collector/blob/main/docs/security-best-practices.md#safeguards-against-denial-of-service-attacks", "feature gate ID": "component.UseLocalHostAsDefaultHost"}
2024-04-09T09:11:07.291Z    info    otlpreceiver@v0.94.1/otlp.go:102    Starting GRPC server    {"kind": "receiver", "name": "otlp", "data_type": "traces", "endpoint": "0.0.0.0:4317"}
2024-04-09T09:11:07.291Z    warn    internal@v0.94.1/warning.go:42  Using the 0.0.0.0 address exposes this server to every network interface, which may facilitate Denial of Service attacks. Enable the feature gate to change the default and remove this warning.    {"kind": "receiver", "name": "otlp", "data_type": "traces", "documentation": "https://github.com/open-telemetry/opentelemetry-collector/blob/main/docs/security-best-practices.md#safeguards-against-denial-of-service-attacks", "feature gate ID": "component.UseLocalHostAsDefaultHost"}
2024-04-09T09:11:07.291Z    info    otlpreceiver@v0.94.1/otlp.go:152    Starting HTTP server    {"kind": "receiver", "name": "otlp", "data_type": "traces", "endpoint": "0.0.0.0:4318"}
2024-04-09T09:11:07.291Z    info    service@v0.94.1/service.go:166  Everything is ready. Begin running and processing data.
2024-04-09T09:11:07.291Z    warn    localhostgate/featuregate.go:63 The default endpoints for all servers in components will change to use localhost instead of 0.0.0.0 in a future version. Use the feature gate to preview the new default.   {"feature gate ID": "component.UseLocalHostAsDefaultHost"}

Feel so close as I'm now seeing both pods in the single NS.

bryan-aguilar commented 7 months ago

As mentioned earlier you shouldn't need to use the migration guide at all if you were not using any advanced configuration parameters pre v0.88.0. I see you have now enabled the otlp ingest collector in the advanced configuration though.

I think http://adot-col-otlp-ingest-collector:4317 is a mistake in the migration guide. It's missing the namespace and should instead be http://adot-col-otlp-ingest-collector.opentelemetry-operator-system:4317

lifeofmoo commented 7 months ago

i've changed the endpoint to:

http://adot-col-otlp-ingest-collector.opentelemetry-operator-system:4317

Still see this message when I delete/recreate the apps:

{"level":"info","ts":"2024-04-09T16:32:28Z","msg":"Skipping pod instrumentation - already instrumented","namespace":"learning","name":"learning-auth-api-56466c5b6c-7lzsv"}
{"level":"info","ts":"2024-04-09T16:33:24Z","msg":"Skipping pod instrumentation - already instrumented","namespace":"learning","name":""}
{"level":"info","ts":"2024-04-09T16:40:02Z","msg":"Skipping pod instrumentation - already instrumented","namespace":"learning","name":""}
{"level":"info","ts":"2024-04-09T16:42:00Z","msg":"Skipping pod instrumentation - already instrumented","namespace":"etocs","name":""}
{"level":"info","ts":"2024-04-09T16:42:01Z","msg":"Skipping pod instrumentation - already instrumented","namespace":"etocs","name":""}
{"level":"info","ts":"2024-04-09T16:42:01Z","msg":"Skipping pod instrumentation - already instrumented","namespace":"etocs","name":""}

k get instrumentations.opentelemetry.io -A
NAMESPACE   NAME                                       AGE     ENDPOINT                                                                   SAMPLER   SAMPLER ARG
etocs       etocs-generator-frontend-instrumentation   22s     http://adot-col-otlp-ingest-collector.opentelemetry-operator-system:4317             
learning    learning-auth-api-instrumentation          2m20s   http://adot-col-otlp-ingest-collector.opentelemetry-operator-system:4317

bryan-aguilar commented 7 months ago

Are you receiving export errors anymore within the application after changing the endpoint? Such as this?

[otel.javaagent 2024-04-09 08:54:57:063 +0000] [OkHttp http://adot-col-otlp-ingest-collector:4317/...] ERROR io.opentelemetry.exporter.internal.grpc.OkHttpGrpcExporter - Failed to export metrics. The request could not be executed. Full error message: adot-col-otlp-ingest-collector

These logs

{"level":"info","ts":"2024-04-09T16:32:28Z","msg":"Skipping pod instrumentation - already instrumented","namespace":"learning","name":"learning-auth-api-56466c5b6c-7lzsv"}
{"level":"info","ts":"2024-04-09T16:33:24Z","msg":"Skipping pod instrumentation - already instrumented","namespace":"learning","name":""}

are not errors but just informing you that it is not trying to re-instrument the pod because it has detected it has already been instrumented.

lifeofmoo commented 7 months ago

very interesting!

This works when I deploy these files to a new NS called otel

traffic-generator.txt sample-app.txt

lifeofmoo commented 7 months ago

I think I am able to get auto-instrumentation working for Java apps via two different clusters using the default export in the dev cluster and my original custom one the stg cluster.

I may revert entirely to the OpenTelemetry Collector custom resource as I like the ability to add addtional config. So I have to replicate this in our 3rd cluster to be sure I have a set of reproducible patterns.

Question for you please:

If I want to be able to use configs like:

    exporters:
      awsxray:
        region: eu-west-1
        index_all_attributes: true < THIS

      logging:
        loglevel: debug
    service:
      pipelines:
        traces:
          receivers: [otlp]
          processors: [resourcedetection/eks, batch/traces] < THIS
          exporters: [awsxray]

I must use my own OpenTelemetry Collector custom resource?

bryan-aguilar commented 7 months ago

I must use my own OpenTelemetry Collector custom resource?

yes

lifeofmoo commented 7 months ago

Does having the amazon-cloudwatch-observability add-on conflict with the ADOT add-on?

I've got the amazon-cloudwatch-observability installed on the dev cluster - YET the same application which is also deployed in stg cluster reports into x-ray fine. However it doesn't for the same app in dev.

Even though the deployments are the same - and I've confirmed the sample app (i've posted about before) works fine for all 3 clusters (dev, stg, live).

[otel.javaagent 2024-04-11 13:23:07:802 +0000] [OkHttp http://cloudwatch-agent.amazon-cloudwatch:4315/...] ERROR io.opentelemetry.exporter.internal.grpc.GrpcExporter - Failed to export spans. Server responded with UNIMPLEMENTED. This usually means that your collector is not configured with an otlp receiver in the "pipelines" section of the configuration. If export is not desired and you are using OpenTelemetry autoconfiguration or the javaagent, disable export by setting OTEL_TRACES_EXPORTER=none. Full error message: unknown service opentelemetry.proto.collector.trace.v1.TraceService

lifeofmoo commented 7 months ago

It looks like a bunch of CW values "pollute" my manifests for the app in Dev.

bryan-aguilar commented 7 months ago

Yes, there is a conflict between the two addons when trying to use autoinstrumentation. Some of that is mentioned in the compatability docs. https://docs.aws.amazon.com/AmazonCloudWatch/latest/monitoring/CloudWatch-Application-Signals-compatibility.html.

What has happened is that you have installed the CW observability add-on and then by enabling auto instrumentation injection into your workload you have opted into application signals.

lifeofmoo commented 7 months ago

Thanks - that's useful to be aware off. How on earth did this all work before the migration then? I've have the amazon-cloudwatch-observability installed for a few months now.

I also can't work out what I need to do within my k8s manifests to get auto instrumentation to work again.

What needs commenting out / adding?

bryan-aguilar commented 7 months ago

I think the incompatibility was introduced in newer versions of the observability add-on and ADOT Java Agent. For the time being there is no way to stop the observabilty add-on from mutating the workload environment when you have autoinstrumenation annotation enabled.

This means your workloads environment will continue to be populated with the enviornment variables that are breaking your use case. I believe the only way to stop this would be to uninstall the observability add-on.

lifeofmoo commented 7 months ago

Uninstalling the observability add-on has done the trick. Having Otel is our priority at the moment as we're looking to move away from Appdynamics so this unblocks our migration.

Do you know if there is a plan/roadmap to have these add-ons work together with they way i've done auto-instumentation?

Appreciate it's a very fast moving subject matter and breaking changes/functionality are expected.

bryan-aguilar commented 7 months ago

I have brought up the issue with the observability add-on team but don't have any additional information to share yet.

lifeofmoo commented 7 months ago

Thanks for all your help on this @bryan-aguilar. Have a good weekend!

aws-observability / aws-otel-community

Migration from v0.88.0-eksbuild calcifications #1005