StatCan / aaw

Documentation for the Advanced Analytics Workspace Platform
https://statcan.github.io/aaw/
Other
68 stars 12 forks source link

ApmServer set up #1892

Open Jose-Matsuda opened 11 months ago

Jose-Matsuda commented 11 months ago

Follow up to https://github.com/StatCan/aaw/issues/1858 Continue work done in pr https://github.com/StatCan/aaw-argocd-manifests/pull/354.

I must determine WHY the ApmServer is erroring out with messages likes;

"http: TLS handshake error from some IP: remote error: tls: bad certificate","service.name":"apm-server","ecs.version":"1.6.0"
http: TLS handshake error from some IP: tls: first record does not look like a TLS handshake","service.name":"apm-server","ecs.version":"1.6.0"}

I think I need to verify the contents of the /usr/share/apm-server/apm-server.yml

The thing we want to see is with our elastic demo deployed in the monitoring system via helm install -n monitoring-system -f values.yaml my-otel-demo open-telemetry/opentelemetry-demo and then eventually when we see data in APM we can remove with helm uninstall -n monitoring-system my-otel-demo

Try to compare and contrast what we have in apm.yaml vs kind-apm.yaml where the apm-server.yml config is set more for the apm.yaml manual deployment.

List of resources in monitoring-system that this applies to apmserver named apm-server-op pods with my-otel-demo used to generate logs and data in APM pod of otelcollector-collector for the collector

Jose-Matsuda commented 11 months ago

Just bringing and using the raw yaml configmap that i used for the standalone apm deployment (not using operator) I was able to see the no longer blocking but before that I get a image

The odd part here that I didn't consider either is that my standalone deployment was not on the service mesh, which I think it should have been. The one deployed by the operator is on the mesh which is correct, but also the one container doesnt come up correctly (probably due to that error above).

Actually now it doesnt stay in one spot after changing to use the correct config i guess via the base yaml config instead of trying to do it (copy the config file like I did for the manual deployment) via the container spec.

Jose-Matsuda commented 11 months ago

Status 05/12/2023

apm-server logs:

remote error: tls: bad certificate

collector logs:

tls: failed to connect to "kibana-monitoring-kb-http.monitoring-system.svc.cluster.local:5601" first record does not look like a TLS handshake -->adding tls: insecure option now gives failed to connect to apm-server-op-apm-http.monitoring-system.svc.cluster.local:8200 Err: connection error: desc = "error reading server preface: EOF