ci: isolate apm-server output from benchmarks cluster

axw commented 4 years ago

In CI we are currently configuring both hey-apm and apm-server to send everything to a single Elasticsearch cluster (observability-benchmarks). The apm-server is also configured to enable "self-instrumentation" (i.e. tracing of requests made by hey-apm to apm-server), sending the data to itself and again indexing into the single Elasticsearch cluster.

I recently enabled continuous profiling in apm-server, in order to analyse resource usage. I had to revert this as it was interfering with the benchmarking/load-testing: https://github.com/elastic/hey-apm/pull/166

To minimise interference we should create a separate cluster for indexing the events sent by hey-apm to apm-server. We would continue to index reports into the observability-benchmarks cluster, as well as monitoring and self-instrumentation. We should set up a long-running apm-server for receiving the self-instrumentation data from apm-servers under test.

Something like this:

apm-server:
  instrumentation:
    enabled: true
    hosts: ["http://observability-benchmarks-apm-server.com:8200"]
    profiling:
      cpu.enabled: true
      heap.enabled: true
output.elasticsearch:
  hosts: ["http://hey-apm-elasticsearch:9200"]
monitoring:
  enabled: true
  elasticsearch:
    hosts: ["http://observability-benchmarks-elasticsearch.com:9200"]

axw commented 4 years ago

@elastic/observablt-robots does this sound feasible?

kuisathaverat commented 4 years ago

What about using different indices? I mand we can deploy a new cluster but sound a little exaggerated to have an Elastic Cloud cluster only for one thing.

axw commented 4 years ago

What about using different indices?

I don't think that would help much, if at all, in terms of having a controlled environment.

I mand we can deploy a new cluster but sound a little exaggerated to have an Elastic Cloud cluster only for one thing.

One of the things we're measuring is how fast apm-server can index into Elasticsearch. If we're indexing into an Elasticsearch cluster that's used for other purposes, then we're not controlling that variable. i.e. the Elasticsearch performance may be unknown, therefore we can't tell if a change in indexing rate is due to a change in apm-server.

Having said that, I think the more important thing right now would be to send self-instrumentation to a separate APM Server. That would at least enable us to unblock continuous profiling.

Does the observability-benchmarks deployment already have an APM Server? If not, can we add one please? We would also need to upgrade the deployment to 7.6.0+ to enable profiling. Then we can modify the apm-server config in hey-apm benchmarks to send its own performance data there.

kuisathaverat commented 4 years ago

One of the things we're measuring is how fast apm-server can index into Elasticsearch. If we're indexing into an Elasticsearch cluster that's used for other purposes, then we're not controlling that variable. i.e. the Elasticsearch performance may be unknown, therefore we can't tell if a change in indexing rate is due to a change in apm-server.

Can we use an ephemeral Elastic Cloud cluster for that? I mean, a cluster that we provision for the test and destroy after the test.

Does the observability-benchmarks deployment already have an APM Server?

Yes, it has APM deployed.

We would also need to upgrade the deployment to 7.6.0+ to enable profiling. Then we can modify the apm-server config in hey-apm benchmarks to send its own performance data there.

I have upgraded the cluster to 7.6.0, about the profiling, I am not sure how to enable it.

axw commented 4 years ago

Can we use an ephemeral Elastic Cloud cluster for that? I mean, a cluster that we provision for the test and destroy after the test.

I think that might work. I suppose we might want to warm it up a bit more in that case, which would slow down hey-apm benchmarks running on PRs, but I think that's an acceptable tradeoff. @elastic/apm-server WDYT?

I have upgraded the cluster to 7.6.0, about the profiling, I am not sure how to enable it.

:heart: I can take care of that part.

jalvz commented 4 years ago

To minimise interference we should create a separate cluster for indexing the events sent by hey-apm to apm-server.

We had discussed it before, just never got around it. The primary reason however was to protect the benchmark cluster from possible downtime caused by apm-server load (eg. disk full, etc)

Can we use an ephemeral Elastic Cloud cluster for that?

SGMT

axw commented 4 years ago

@kuisathaverat I underestimated the amount of work involved in getting the observability-benchmarks APM Server URL and API Key/secret token. I'll need some help with that.

We need to update docker-compose.yml to configure apm-server.instrumentation like in the issue description:

-E apm-server.instrumentation.enabled=true
-E apm-server.instrumentation.hosts=["$APM_SERVER_URL"]
-E apm-server.instrumentation.secret_token="$APM_SERVER_SECRET_TOKEN"
-E apm-server.instrumentation.profiling.cpu.enabled=true
-E apm-server.instrumentation.profiling.heap.enabled=true

In case it's not clear, $APM_SERVER_URL and $APM_SERVER_SECRET_TOKEN are for the observability-benchmarks deployment's APM Server.

kuisathaverat commented 4 years ago

Let's confirm the process and the infra we need for the test

Checkout the code
Deploy an ephemeral Elastisearch on Elastic Cloud for the APM Server data, Do we need Kibana? Do we need APM?
Launch the docker-compose test
- APM Server is configured to sent data to the ephemeral Elasticsearch
- APM Server is configured to send instrumentation to the benchmarks APM Server
- Hey APM sen data to the benchmark Elasticsearch

Is that what we want?

axw commented 4 years ago

Deploy an ephemeral Elastisearch on Elastic Cloud for the APM Server data, Do we need Kibana? Do we need APM?

Only Elasticsearch. Kibana is not required, and we'll run apm-server on the baremetal CI machine.

APM Server is configured to sent data to the ephemeral Elasticsearch

Yes, this is where output.elasticsearch should point.

APM Server is configured to send instrumentation to the benchmarks APM Server

Yes.

Also, we should configure monitoring.elasticsearch to point to the benchmarks Elasticsearch. Otherwise monitoring will inherit the value set for output.elasticsearch, which we don't want.

Hey APM sen data to the benchmark Elasticsearch

Yes.

elastic / hey-apm

ci: isolate apm-server output from benchmarks cluster #167