StatCan / aaw

Documentation for the Advanced Analytics Workspace Platform
https://statcan.github.io/aaw/
Other
68 stars 12 forks source link

Integrate OpenTelemetry Data with Elastic #1842

Open Jose-Matsuda opened 1 year ago

Jose-Matsuda commented 1 year ago

Part of Epic Logical Next step to creating an instrumentation for images.

To be considered complete this ticket must


First thing to do Verify where Elastic APM is


Integrating Steps

Jose-Matsuda commented 1 year ago

Copying over comment from initial analysis in the comment If above is good, then integrate with Elastic by sending data from that Otel agent directly into Elastic. This will avoid us needing to use a collector to send data to the backend(elastic/jaeger) which doesn't seem to have any significant differences to send data immediately

Jose-Matsuda commented 1 year ago

Configuration Notes

Send data directly to Elastic via Collector

Configuration suggestions taken directly from Elastic APM image I am not sure if I need all the configs past the second option, as the examples don't use them either.

Also note that the documentation for exporters is slightly out of date as the way to use environment variables is now ${env:VARIABLE} in this commit and in current docs My question is how is that used / where is that set.

So the value of OTEL_EXPORTER_OTLP_ENDPOINT should be the Elastic’s APM Server.

Jose-Matsuda commented 1 year ago

Errors

When applying as somewhat expected we got 2023/09/26 17:32:30 collector server run finished with error: failed to get config: cannot resolve the configuration: expanding ${env:ELASTIC_APM_SECRET_TOKEN}, expected convertable to string value type, got %!q(<nil>)(<nil>), still have to do this probably.

I am going to edit the Jaeger APM since we don't need that anyways image

For this to be useful, I need to enabled TLS but perhaps as just a proof of concept I can neglect this, as I'm not too sure what goes on for that to be turned on both ways. I will just use a secret token that's not that secret for testing, and may use it in plaintext in argocd as well just to test.

Or fine after having to resort to kubectl explain OpenTelemetryCollector.spec I can see that its similar to a pod with envfrom and volumemounts blah blah (whys it so hard to find the spec anywhere)


image

Well I suppose its obvious that localhost:8200 wouldnt work in hindsight it has to be the service url (probably gave me localhost because I port forwarded, actually no it's from the configuration here, so it could be localhost, but maybe need the port to be correct, so maybe close to but not (switch the port to 8200?) http://kibana-monitoring-kb-http.monitoring-system.svc.cluster.local:5601 which is what caused the following error of "transport: authentication handshake failed: tls: first record does not look like a TLS handshake. though that error doesnt really indicate a wrong port i think.

Switched the port to 8200 and still got TLS handshake, probably do need to enable TLS though not sure of how much of added effort that'll be

Jose-Matsuda commented 1 year ago

Backlogged due to https://github.com/StatCan/aaw-private/issues/143 as well as us missing a crucial step in having an APM service which probably needs to be tackled in https://github.com/StatCan/aaw/issues/1858 <-- basically doing this at the same time.

Jose-Matsuda commented 12 months ago

I am moving this monster of a task (for me) back to the backlog. I want to focus on setting up APM naturally and it will just flow fine once it is in. The problem seems to be the APM server, I have example apps supposedly generating data but i'm getting cert errors. https://github.com/StatCan/aaw-argocd-manifests/pull/354#issuecomment-1804152879 Is what I am trying to achieve