aws-observability / aws-otel-community

Welcome to the AWS Distro for OpenTelemetry project. If you're using monitoring and observability tools for AWS products and services, this is a great place to ask questions, request features and network with other community members.
https://aws-otel.github.io/
Apache License 2.0
96 stars 95 forks source link

Help confirming otlp reciever configuration #178

Closed sbowers-gbx closed 1 year ago

sbowers-gbx commented 1 year ago

Hello,

I'm exploring the otel golang trace sdk and have had trouble with both the otlptracegrpc and otlptracehttp exporters establishing successful communication with my adot-otel sidecar's otlp reciver configurations.

I've implemented a proof of concept otelgin endpoint in my testing app. I've got an adot-otel-collector running with this summarized config:

ADOT Collector Config

receivers:
  otlp:
    protocols:
      grpc:
        endpoint: localhost:4317
      http:
        endpoint: localhost:4318

processors:
  batch:
    timeout: 10s  

exporters:
  logging:
    loglevel: info
    sampling_initial: 5
    sampling_thereafter: 200
  # https://github.com/open-telemetry/opentelemetry-collector-contrib/blob/main/exporter/datadogexporter/example/config.yaml
  datadog:
    api:
      key: "nope"
    metrics:
      resource_attributes_as_tags: true
    traces:
      ignore_resources: ["(GET|POST|HEAD) /servicehealth"]

service:
  telemetry:
    logs:
      level: "warn"
  extensions: [health_check,zpages]
  pipelines:
    traces:
      receivers: [otlp]
      processors: [batch]
      exporters: [logging, datadog]

ADOT docker-compose

  otel:
    image: public.ecr.aws/aws-observability/aws-otel-collector:v0.24.1
    ports:
      - "8125:8125"
      - "4317:4317"
      - "4318:4318"
    command: "--config=/etc/otel/config.yaml"
    volumes:
      - ./otel-config-local.yml:/etc/otel/config.yaml
    networks:
      - development

Golang Code

For the otlptracehttp connection I've pretty much copy/pasted from examples, and the resulting logs suggest that this copied golang code is working as expected:

func InitTracer(ctx context.Context) *trace.TracerProvider {
    // ATTEMPT at HTTP (uses default endpoint Post "http://localhost:4318/v1/traces")
    client := otlptracehttp.NewClient(
        otlptracehttp.WithInsecure(),
    )
    traceExporter, err := otlptrace.New(ctx, client)
    if err != nil {
        logging.Fatal(ctx, "idk some http client issue %s", err)
    }

    // Register the trace exporter with a TracerProvider, using a batch
    // span processor to aggregate spans before export.
    bsp := trace.NewBatchSpanProcessor(traceExporter)

    // create a resource with our preferred attributes
    //    https://pkg.go.dev/go.opentelemetry.io/otel/sdk@v1.11.2/resource
    r, err := resource.New(ctx, resource.WithAttributes(semconv.ServiceNameKey.String("sbowers-api")))
    if err != nil {
        log.Fatal(err)
        return nil
    }

    // merge that custom resource with the default resource (letting our custom values override in any collision scenario)
    resource, err := resource.Merge(resource.Default(), r)
    if err != nil {
        log.Fatal(err)
        return nil
    }

    // create the TracerProvider
    tp := trace.NewTracerProvider(
        trace.WithSampler(trace.AlwaysSample()),
        // trace.WithBatcher(exporter), // comment out if using OTLP exporter, only for stdout exporter.
        trace.WithResource(resource),
        trace.WithSpanProcessor(bsp), // commount out if using STDOUT expoter, only for otlp exporter.
    )

    // set that tp as our global Otel TracerProvider
    otel.SetTracerProvider(tp)

    // take example propagation settings for now
    otel.SetTextMapPropagator(propagation.NewCompositeTextMapPropagator(propagation.TraceContext{}, propagation.Baggage{}))

    // return our TracerProvider object fully configured
    return tp
}

Golang Log lines

From my golang service logs:

2022/12/16 16:46:03 Post "http://localhost:4318/v1/traces": dial tcp 127.0.0.1:4318: connect: connection refused

From a local terminal just testing that endpoint's existence:

curl -d '{"key1":"value1", "key2":"value2"}' -H "Content-Type: application/json" -X POST http://localhost:4318/v1/traces
curl: (52) Empty reply from server

Any suggestions for fixing my ADOT OTLPRECEIVER configuration to accept insecure connections via http and/or grpc would be appreciated. If I can provide better or further information I'll be glad to.

Thanks in advance, Scott Bowers

bryan-aguilar commented 1 year ago

Is TLS Insecure what you are looking for?

sbowers-gbx commented 1 year ago

Maybe, but I think I'm having trouble getting the syntax right.

Is this a valid receivers section?

receivers:
  otlp:
    protocols:
      grpc:
        endpoint: localhost:4317
      http:
        endpoint: localhost:4318
        tls:
          insecure: false
          insecure_skip_verify: true

With that specified (and my SDK implentation updated to use https) I still get a connection refused error:

2022/12/16 17:44:43 Post "https://localhost:4318/v1/traces": dial tcp 127.0.0.1:4318: connect: connection refused

bryan-aguilar commented 1 year ago

Can you try using 0.0.0.0 instead of local host?

sbowers-gbx commented 1 year ago

Definitely, that was where I started before seeing the warning here: https://github.com/open-telemetry/opentelemetry-collector/blob/main/docs/security-best-practices.md#safeguards-against-denial-of-service-attacks

So ignoring that for now since I'm trying to be insecure anyways:

receivers:
  otlp:
    protocols:
      grpc:
        endpoint: 0.0.0.0:4317
      http:
        endpoint: 0.0.0.0:4318
        tls:
          insecure: false
          insecure_skip_verify: true

Note: the below defaults to localhost.

2022/12/16 18:00:22 Post "https://localhost:4318/v1/traces": dial tcp 127.0.0.1:4318: connect: connection refused

I went and specifically rewrote the default to be sure:

    client := otlptracehttp.NewClient(
        otlptracehttp.WithEndpoint("0.0.0.0:4318"),
    // otlptracehttp.WithInsecure(),
    )

2022/12/16 18:04:18 Post "https://0.0.0.0:4318/v1/traces": dial tcp 0.0.0.0:4318: connect: connection refused

bryan-aguilar commented 1 year ago

Sorry about the shotgunning of suggestion, but I'm trying to work through identifying the fix here which would help with root cause analysis.

have you tried the 0.0.0.0 and the withInsecure?

sbowers-gbx commented 1 year ago

Oops, our last few tests have been invalid:

2022/12/16 18:27:41 ADOT Collector version: v0.24.1
2022/12/16 18:27:41 found no extra config, skip it, err: open /opt/aws/aws-otel-collector/etc/extracfg.txt: no such file or directory
Error: failed to get config: cannot unmarshal the configuration: 1 error(s) decoding:

* error decoding 'receivers': error reading receivers configuration for "otlp": 1 error(s) decoding:

* 'protocols.http.tls' has invalid keys: insecure, insecure_skip_verify
2022/12/16 18:27:41 application run finished with error: failed to get config: cannot unmarshal the configuration: 1 error(s) decoding:

* error decoding 'receivers': error reading receivers configuration for "otlp": 1 error(s) decoding:

* 'protocols.http.tls' has invalid keys: insecure, insecure_skip_verify
sbowers-gbx commented 1 year ago

I'll take some time to make sure that my testing process is sound, but maybe some better understanding of the "notls" mode described here: https://github.com/open-telemetry/opentelemetry-collector/blob/main/config/configtls/README.md#server-configuration is what I need.

bryan-aguilar commented 1 year ago

Does it not like that you defined insecure and insecure_skip_verify even though you used the default value for insecure?

sbowers-gbx commented 1 year ago

@bryan-aguilar are insecure and insecure_skip_verify valid keys to provide for the receivers config section? Together or separate they continue to produce "invalid keys" errors.

First with only this key provided

2022-12-19 08:28:37 * 'protocols.http.tls' has invalid keys: insecure_skip_verify

Then with only this key provided

2022-12-19 08:30:26 * 'protocols.http.tls' has invalid keys: insecure
bryan-aguilar commented 1 year ago

Sorry, @sbowers-gbx, I missed the notification for your comment. Those are not valid keys for receivers. They are valid for exporters. See the docs here for receiver config settings.

Might I suggest to take a peek at the aws-otel-go repository. It has a very small, and a bit outdated (we are moving to new sample apps), of a sample app. I would take a look and compare the provider setup. Then take a look at the config and docker compose file that is used to setup app -> collector communications.

sbowers-gbx commented 1 year ago

Hey @bryan-aguilar no worries! I obviously wasn't dying for progress on this. I'll Close this for now to avoid leaving cruft in the repo - but I'll definitely pick back up and reopen/link-to this if I feel it worthwhile once I get restarted on our otel-sdk adoption.