elastic / elastic-package

elastic-package - Command line tool for developing Elastic Integrations
Other
49 stars 116 forks source link

[System test runner] Add more service deployers #89

Closed ycombinator closed 3 years ago

ycombinator commented 4 years ago

Follow up to #64.

Currently the system test runner only supports the Docker Compose service deployer. That is, it can only test packages whose services can be spun up using Docker Compose. We should add more service deployers to enable system testing of packages such as system (probably a no-op or minimal service deployer), aws (probably some way to pass connection parameters and credentials via environment variables and/or something that understands terraform files), kubernetes.

mtojek commented 3 years ago

For reference:

Testing on Kubernetes written by @ChrsMark : https://github.com/elastic/integrations/blob/master/testing/environments/kubernetes/README.md (probably outdated now as there were many changes introduced to Fleet)

mtojek commented 3 years ago

We need to cover following providers:

(correct me if I missed any of them)

Notes:

Other use cases:

Technical observations:

Questions:

  1. Should the provider's stack be spawned similarly to the service's Docker-based stack, for the time of tests execution, or in a long-running manner like the Elastic stack?
  2. Should we provide an option of acquiring authorization data for the internal team?
mtojek commented 3 years ago

@kaiyan-sheng @narph @ChrsMark

Would you mind describing here use cases for AWS, Azure and Kubernetes? I'm looking forward to seeing how these cloud/infra providers can be used for testing integrations.

ChrsMark commented 3 years ago

Thanks for the ping @mtojek, I will try to provide a scenario, with inline comments/thoughts that would cover our k8s needs.

Vanilla Kubernetes

  1. Run elastic-package k8s up to bring up a k8s cluster, I don't think we should care where it is, on GKE or locally on minikube or kind. Maybe it will be better to have it running on GKE for now to avoid an extra step of minikube/kind installation (?). In this step all the required prerequisites should happen, like installing kube_state_metrics from which state_* metricsets will collect metrics from.
  2. Run elastic-package test k8s (the syntax is abstract here for the sake of the example) so as to deploy agent on the running k8s cluster and enrol it to the Elastic Stack. Elastic stack should be running maybe on the same k8s cluster so as to have easier networking configuration to my mind (similar to the approach mentioned at testing on k8s). After the test is completed the cluster is still up and the agents are still shipping metrics. To clean this up we need to run the next command to bring the whole cluster down.
  3. Run elastic-package k8s down which will destroy the cluster including the Elastic stack and Agents.

Note: I think this scenario will can be expanded to test other packages like istio and ingress-controller by adding them as extra flags in step 1.

OCP

  1. This scenario step will be the same as for vanilla k8s, the only difference will be the installation step, where we need an Openshift installation. Here if we want a from scratch installation we need to run the GCP installer which takes ~40 mins to bring the cluster up. Not sure if this can be part of a CI job. Maybe can be a nightly job. Related to https://github.com/elastic/beats/issues/17962. Ping me directly for more info ;).
  2. Same as vanilla k8s but we will need slightly different manifest most probably cause of OCP restrictions.
  3. Same as vanilla k8s, but use the GCP installer script to bring the cluster down.

Note 1: This is only for testing k8s module, but it should be quite similar for testing Autodiscover. Note 2: The Running Agent on k8s thing is not yet completely decided. Progress/discussion happen around this at k8s-agent WP, cc: @blakerouse

kaiyan-sheng commented 3 years ago

For AWS testing, we can use a terraform script(or anything similar) per dataset/package to create AWS services for testing and cleanup after testing. I think we have an AWS account for testing in Beats jenkins (@jsoriano knows more about this) and we can leverage it here.

For metrics: an example can be we can run elastic-package test ec2-metrics locally to apply the terraform script to create an EC2 instance in AWS, wait for a while till EC2 metrics are sent into CloudWatch, check events collected from ec2-metrics package and delete the EC2 instance at the end.

For logs: We have sample files to test the pipelines already but it would be good to have terraform to setup S3-SQS to test the inputs.

There are two use cases here: one is to run this in CI and the other one is for package developers to test locally. Because creating services can be cost-inefficient, we should consider how frequently should we run elastic-package test ec2-metrics in CI?

mtojek commented 3 years ago

There are two use cases here: one is to run this in CI and the other one is for package developers to test locally. Because creating services can be cost-inefficient, we should consider how frequently should we run elastic-package test ec2-metrics in CI?

With this PR https://github.com/elastic/integrations/pull/474 tests will be executed only if the relevant packages are changed (in this case AWS integration) or this is the master branch.

Regarding elastic-package test k8s and elastic-package test ec2-metrics I think we need to come up with an open, flexible API, so that we don't have to modify CLI every time we introduce new platform, but this is something we'll research :) I admit I haven't looked at the k8s as a separate stack, rather as a service under test that is alive for the duration of a test. Keeping it as a separate stack (like Elastic stack) might actually simplify things.

narph commented 3 years ago

for Azure we can look at something similar as the use case above. I previously worked on a POC using Pulumi which will authenticate the user, create a storage account , fetch metrics, validate on them and then remove the entire deployment. I hope it is of interest here https://github.com/elastic/beats/pull/21850. Maybe something like elastic-package test azure storage could replace the entire process. For azure logs, more steps are required, for example after creating the event hub we will have to populate it with some valid/invalid messages. Not sure in how much detail we should go in this issue.

mtojek commented 3 years ago

I'm going with this issue.

mtojek commented 3 years ago

Thank you for all feedback, folks! We had a sync-up with @ycombinator to discuss possible options. Technically - we'll try to implement a generic Terraform based test runner. We wouldn't like to include AWS/Azure/K8s references in the CLI - let's try to make it as generic as possible. The approach will be truly declarative, which is in-line with the original principle (no programming language is required).

Here is a list of action items to help us solve this issue.

Dev changes in package-spec:

@ycombinator, I still have doubts which path should we follow. If you have any preferences or see benefits of any of them, please feel free to share.

Changes in elastic-package:

Changes in integrations:

ChrsMark commented 3 years ago

Thanks for the heads-up @mtojek! Feel free to reach out to me if you guys have any questions about the k8s specifics since it can be tricky with different components we collect from unlike other clouds where we define a single exposed endpoint.

kaiyan-sheng commented 3 years ago

With this PR elastic/integrations#474 tests will be executed only if the relevant packages are changed (in this case AWS integration) or this is the master branch.

Great, thank you!

ycombinator commented 3 years ago

Thanks for the write up and breakdown of tasks, @mtojek. Very helpful!

Dev changes in package-spec:

  • [ ] Allow for data-stream level _dev/deploy definitions or
  • [ ] Mount extra files for data-stream in runtime (it may prevent from building the image multiple times)

@ycombinator, I still have doubts which path should we follow. If you have any preferences or see benefits of any of them, please feel free to share.

I recall discussing the first option (Allow for data-stream level _dev/deploy definitions) in our meeting today but not the second one (Mount extra files for data-stream in runtime (it may prevent from building the image multiple times)). Would you mind explaining some details about the second option? Thanks.

mtojek commented 3 years ago

(I came up to this point based on observing the Zeek integration)

I can elaborate on this. Imagine we have an integration XYZ with data streams A, B, C, ... Z. Every data stream is basically the same Docker image with terraform executor and own set of static tf templates. The improvement is to use a single Docker image and simply mount (switch) templates for the data stream test scenario. This way it will be faster than building new Docker image for a data stream.

ycombinator commented 3 years ago

I always assumed (but probably didn't make it explicit, sorry!) that there would be one shared/common TF executor Docker image that is used by the TF service deployer. The definition and maintenance of this image is the responsibility of elastic-package developers, as opposed to that of package developers.

The part that varies is the TF templates, whether those come from the package-level ({package}/_dev/deploy/tf/...) or the data stream-level ({package}/data_stream/{data stream}/_dev/deploy/tf/...). The definition and maintenance of this image is the responsibility of package developers.

So I think we're on the same page?

mtojek commented 3 years ago

The part that varies is the TF templates, whether those come from the package-level ({package}/_dev/deploy/tf/...) or the data stream-level ({package}/data_stream/{data stream}/_dev/deploy/tf/...). The definition and maintenance of this image is the responsibility of package developers.

I agree with the rest of your comment. Regarding the quoted paragraph - what is the best of processing these TF templates (belonging to particular data-streams)? Load them in the runtime? Include them in the build time (one image build per data stream)?

(I think we're on the same page, just confirming the implementation details :)

ycombinator commented 3 years ago

Load them in the runtime? Include them in the build time (one image build per data stream)?

There is also a third option: include all of them at image build time (so you are not building one image per data stream) and then select the right data stream's templates at runtime.

At any rate, I don't know if there's an obvious answer to this one. I would suggest trying one of the options, probably the one you think is simplest to implement, see how well it performs and then iterate from there as necessary.

jsoriano commented 3 years ago

+1 to implement this as a generic declarative Terraform-based runner :+1:

Some comments in case they are helpful:

mtojek commented 3 years ago

Thank you for sharing your mind, lot's of tricky ideas ;) I like the idea of kops.

In elastic/beats#17656 Blake extended mage goIntegTest for Metricbeat to be able to run tests in Kubernetes (with kind) apart of the usual docker compose. There it was also done in a generic way, one provider or the other were used depending on the available files. A similar approach could be followed here to continue supporting docker-compose, or if we want to support other providers in the future.

Honestly I think we're not there yet. First, the Elastic-Agent needs to support autodiscovery and Kubernetes runtime. Then we can think about potential integrations. Keep in mind that we'd like to examine integrations not the entire the end-to-end flow. I would leave the verification of the Elastic-Agent functionality in different runtime to the Agent or e2e-tests.

ChrsMark commented 3 years ago

@mtojek @ycombinator fyi for k8s package testing I'm using some mock APIs so as to proceed until we reach to a more permanent solution. You can find more at https://github.com/elastic/integrations/pull/569.

While working with these mocks I realise more the need for running against an actual k8s cluster and more specifically having Agent deployed on the cluster natively. Without this, many things like k8s tokens crts etc we need will not be valid.

ycombinator commented 3 years ago

While working with these mocks I realise more the need for running against an actual k8s cluster and more specifically having Agent deployed on the cluster natively. Without this, many things like k8s tokens crts etc we need will not be valid.

This is super valuable information. @mtojek and I have informally discussed the idea that for some service deployers it might make sense to deploy the agent "along side" the service — your findings seem to be along these lines so this is very valuable feedback. Thank you!

mtojek commented 3 years ago

@kaiyan-sheng AWS integration can be tested now using the Terraform executor (sample here: https://github.com/elastic/integrations/tree/master/packages/aws/data_stream/ec2_metrics).

@narph this feature is written in a generic way. If you pass secrets for Azure and write some TF code, it's expected to work.

EDIT:

we just need to enable secrets on the Jenkins side, but shouldn't be a big issue (unless we don't have them generated at all).

mtojek commented 3 years ago

Let me summarize it -

We've delivered (and applied in Integrations):