Closed ycombinator closed 3 years ago
For reference:
Testing on Kubernetes written by @ChrsMark : https://github.com/elastic/integrations/blob/master/testing/environments/kubernetes/README.md (probably outdated now as there were many changes introduced to Fleet)
We need to cover following providers:
(correct me if I missed any of them)
Notes:
Other use cases:
Technical observations:
terraform
tool to be installed locally.Questions:
@kaiyan-sheng @narph @ChrsMark
Would you mind describing here use cases for AWS, Azure and Kubernetes? I'm looking forward to seeing how these cloud/infra providers can be used for testing integrations.
Thanks for the ping @mtojek, I will try to provide a scenario, with inline comments/thoughts that would cover our k8s needs.
elastic-package k8s up
to bring up a k8s cluster, I don't think we should care where it is, on GKE or locally on minikube or kind. Maybe it will be better to have it running on GKE for now to avoid an extra step of minikube/kind installation (?). In this step all the required prerequisites should happen, like installing kube_state_metrics
from which state_*
metricsets will collect metrics from.elastic-package test k8s
(the syntax is abstract here for the sake of the example) so as to deploy agent on the running k8s cluster and enrol it to the Elastic Stack. Elastic stack should be running maybe on the same k8s cluster so as to have easier networking configuration to my mind (similar to the approach mentioned at testing on k8s). After the test is completed the cluster is still up and the agents are still shipping metrics. To clean this up we need to run the next command to bring the whole cluster down.elastic-package k8s down
which will destroy the cluster including the Elastic stack and Agents.Note: I think this scenario will can be expanded to test other packages like istio
and ingress-controller
by adding them as extra flags in step 1.
Note 1: This is only for testing k8s module, but it should be quite similar for testing Autodiscover.
Note 2: The Running Agent on k8s
thing is not yet completely decided. Progress/discussion happen around this at k8s-agent WP, cc: @blakerouse
For AWS testing, we can use a terraform script(or anything similar) per dataset/package to create AWS services for testing and cleanup after testing. I think we have an AWS account for testing in Beats jenkins (@jsoriano knows more about this) and we can leverage it here.
For metrics: an example can be we can run elastic-package test ec2-metrics
locally to apply the terraform script to create an EC2 instance in AWS, wait for a while till EC2 metrics are sent into CloudWatch, check events collected from ec2-metrics package and delete the EC2 instance at the end.
For logs: We have sample files to test the pipelines already but it would be good to have terraform to setup S3-SQS to test the inputs.
There are two use cases here: one is to run this in CI and the other one is for package developers to test locally. Because creating services can be cost-inefficient, we should consider how frequently should we run elastic-package test ec2-metrics
in CI?
There are two use cases here: one is to run this in CI and the other one is for package developers to test locally. Because creating services can be cost-inefficient, we should consider how frequently should we run elastic-package test ec2-metrics in CI?
With this PR https://github.com/elastic/integrations/pull/474 tests will be executed only if the relevant packages are changed (in this case AWS integration) or this is the master branch.
Regarding elastic-package test k8s
and elastic-package test ec2-metrics
I think we need to come up with an open, flexible API, so that we don't have to modify CLI every time we introduce new platform, but this is something we'll research :) I admit I haven't looked at the k8s as a separate stack, rather as a service under test that is alive for the duration of a test. Keeping it as a separate stack (like Elastic stack) might actually simplify things.
for Azure
we can look at something similar as the use case above. I previously worked on a POC using Pulumi
which will authenticate the user, create a storage account , fetch metrics, validate on them and then remove the entire deployment.
I hope it is of interest here https://github.com/elastic/beats/pull/21850.
Maybe something like elastic-package test azure storage
could replace the entire process.
For azure logs, more steps are required, for example after creating the event hub we will have to populate it with some valid/invalid messages.
Not sure in how much detail we should go in this issue.
I'm going with this issue.
Thank you for all feedback, folks! We had a sync-up with @ycombinator to discuss possible options. Technically - we'll try to implement a generic Terraform based test runner. We wouldn't like to include AWS/Azure/K8s references in the CLI - let's try to make it as generic as possible. The approach will be truly declarative, which is in-line with the original principle (no programming language is required).
Here is a list of action items to help us solve this issue.
Dev changes in package-spec:
_dev/deploy
definitions - https://github.com/elastic/package-spec/pull/111
or
~- [ ] Mount extra files for data-stream in runtime (it may prevent from building the image multiple times)~@ycombinator, I still have doubts which path should we follow. If you have any preferences or see benefits of any of them, please feel free to share.
Changes in elastic-package:
_dev/deploy
for data stream first (if available) - https://github.com/elastic/elastic-package/pull/228
~- [ ] Consider shortening the total build time of Docker services (build them at most once)~tf
service deployer - a Docker image, which can execute provided terraform templates or proxy traffic for Elastic-Agent. The docker container will manage the lifecycle of created cloud components (machines, buckets, databases) - https://github.com/elastic/elastic-package/pull/227Changes in integrations:
Thanks for the heads-up @mtojek! Feel free to reach out to me if you guys have any questions about the k8s specifics since it can be tricky with different components we collect from unlike other clouds where we define a single exposed endpoint.
With this PR elastic/integrations#474 tests will be executed only if the relevant packages are changed (in this case AWS integration) or this is the master branch.
Great, thank you!
Thanks for the write up and breakdown of tasks, @mtojek. Very helpful!
Dev changes in package-spec:
- [ ] Allow for data-stream level
_dev/deploy
definitions or- [ ] Mount extra files for data-stream in runtime (it may prevent from building the image multiple times)
@ycombinator, I still have doubts which path should we follow. If you have any preferences or see benefits of any of them, please feel free to share.
I recall discussing the first option (Allow for data-stream level _dev/deploy
definitions) in our meeting today but not the second one (Mount extra files for data-stream in runtime (it may prevent from building the image multiple times)). Would you mind explaining some details about the second option? Thanks.
(I came up to this point based on observing the Zeek integration)
I can elaborate on this. Imagine we have an integration XYZ with data streams A, B, C, ... Z. Every data stream is basically the same Docker image with terraform executor and own set of static tf templates. The improvement is to use a single Docker image and simply mount (switch) templates for the data stream test scenario. This way it will be faster than building new Docker image for a data stream.
I always assumed (but probably didn't make it explicit, sorry!) that there would be one shared/common TF executor Docker image that is used by the TF service deployer. The definition and maintenance of this image is the responsibility of elastic-package
developers, as opposed to that of package developers.
The part that varies is the TF templates, whether those come from the package-level ({package}/_dev/deploy/tf/...
) or the data stream-level ({package}/data_stream/{data stream}/_dev/deploy/tf/...
). The definition and maintenance of this image is the responsibility of package developers.
So I think we're on the same page?
The part that varies is the TF templates, whether those come from the package-level ({package}/_dev/deploy/tf/...) or the data stream-level ({package}/data_stream/{data stream}/_dev/deploy/tf/...). The definition and maintenance of this image is the responsibility of package developers.
I agree with the rest of your comment. Regarding the quoted paragraph - what is the best of processing these TF templates (belonging to particular data-streams)? Load them in the runtime? Include them in the build time (one image build per data stream)?
(I think we're on the same page, just confirming the implementation details :)
Load them in the runtime? Include them in the build time (one image build per data stream)?
There is also a third option: include all of them at image build time (so you are not building one image per data stream) and then select the right data stream's templates at runtime.
At any rate, I don't know if there's an obvious answer to this one. I would suggest trying one of the options, probably the one you think is simplest to implement, see how well it performs and then iterate from there as necessary.
+1 to implement this as a generic declarative Terraform-based runner :+1:
Some comments in case they are helpful:
mage goIntegTest
for Metricbeat to be able to run tests in Kubernetes (with kind) apart of the usual docker compose. There it was also done in a generic way, one provider or the other were used depending on the available files. A similar approach could be followed here to continue supporting docker-compose
, or if we want to support other providers in the future.elastic-package
should always provide some base resources when some specific providers are used, so scenarios can be simpler. Same thing with kubernetes, an scenario could define some kubernetes resources, but elastic-package
would provide the cluster and the credentials.Thank you for sharing your mind, lot's of tricky ideas ;) I like the idea of kops.
In elastic/beats#17656 Blake extended mage goIntegTest for Metricbeat to be able to run tests in Kubernetes (with kind) apart of the usual docker compose. There it was also done in a generic way, one provider or the other were used depending on the available files. A similar approach could be followed here to continue supporting docker-compose, or if we want to support other providers in the future.
Honestly I think we're not there yet. First, the Elastic-Agent needs to support autodiscovery and Kubernetes runtime. Then we can think about potential integrations. Keep in mind that we'd like to examine integrations not the entire the end-to-end flow. I would leave the verification of the Elastic-Agent functionality in different runtime to the Agent or e2e-tests.
@mtojek @ycombinator fyi for k8s package testing I'm using some mock APIs so as to proceed until we reach to a more permanent solution. You can find more at https://github.com/elastic/integrations/pull/569.
While working with these mocks I realise more the need for running against an actual k8s cluster and more specifically having Agent deployed on the cluster natively. Without this, many things like k8s tokens crts etc we need will not be valid.
While working with these mocks I realise more the need for running against an actual k8s cluster and more specifically having Agent deployed on the cluster natively. Without this, many things like k8s tokens crts etc we need will not be valid.
This is super valuable information. @mtojek and I have informally discussed the idea that for some service deployers it might make sense to deploy the agent "along side" the service — your findings seem to be along these lines so this is very valuable feedback. Thank you!
@kaiyan-sheng AWS integration can be tested now using the Terraform executor (sample here: https://github.com/elastic/integrations/tree/master/packages/aws/data_stream/ec2_metrics).
@narph this feature is written in a generic way. If you pass secrets for Azure and write some TF code, it's expected to work.
EDIT:
we just need to enable secrets on the Jenkins side, but shouldn't be a big issue (unless we don't have them generated at all).
Let me summarize it -
We've delivered (and applied in Integrations):
kind
and potentially additional resources (e.g. custom application deployment).
Follow up to #64.
Currently the system test runner only supports the Docker Compose service deployer. That is, it can only test packages whose services can be spun up using Docker Compose. We should add more service deployers to enable system testing of packages such as
system
(probably a no-op or minimal service deployer),aws
(probably some way to pass connection parameters and credentials via environment variables and/or something that understands terraform files),kubernetes
.