Open willfindlay opened 2 years ago
Hello @willfindlay !
I am new to writing e2e tests in Go, I would like to work on this issue on a "learn on the Go" basis. So, it would be great if you could point me some directions so that I can get started with the issue.
In the meantime I am going through the checker package present at tests/e2e/checker/rpcchecker.go
I have a few questions to get started:
1: What are the specific prometheus metrics we need to verify during e2e tests ? A simple example will work. 2: Requirements for the metrics checker ? 3: How will we integrate it with the existing test framework ?
@kkourt
Hey @prateek041, thanks for your interest in the project! Let me take a little time to come up with a more concrete list of requirements and I'll follow up here shortly.
Thank you @willfindlay ! Really looking forward to contribute to the project.
@prateek041 Here's a rough answer for the above questions to get you started.
Bonus points if we can identify which pod(s) fail the checks in multi-node clusters. Could be useful for debugging a failed test.
Thanks for sharing ! I will be working on the issue now.
After going through the code base, here is the list of metrics I found.
present here in the metrics package
Out of these metrics, for whom do we intent to write tests ? According to @willfindlay most of these need to be covered except probes.
I don't think we want to write new tests (yet). Rather I want to add these metrics checks to the existing tests.
I am trying to understand how should I filter out the metrics that the user wants from all the metrics being exposed at /metrics path. Like one way to do it is using loops and matching selectedMetrics[], there is also this v1 package which I haven't yet fully tested, there might be more that I haven't come across but what is the recommended way here ?
I tried to go through rpc checker to understand how it is doing it, but since I don't know much about protobuf files, I am unable to understand much. Here is the function.
Here are some additional questions:
I am trying to understand how should I filter out the metrics that the user wants from all the metrics being exposed at /metrics path.
There are a couple of approaches that could work here. Figuring out which one to use is part of the exercise. The API client package you linked sounds promising.
there is a multiplexer for GRPc should something similar be implemented for metrics too ?
We will need the metricschecker to validate metrics for multi-node clusters. So we will need something like the multiplexer that should abstract over multiple metrics connections (one per pod).
You mentioned "expose a builder that let's you write queries", please elaborate that a bit more.
Just an API similar to the eventchecker. Something like NewMetricsChecker().LessThanOrEqual("ringbuf_dropped_count", 0)
or similar.
Hey @prateek041, just checking in. Any progress updates or questions from your side?
Hello @willfindlay !
Couldn't work on the issue for two days due to bad weather conditions here. I have a few questions but I try hard to find as much answers on my own and ask 3-4 together so I don't take much of your time.
Updates:
I can successfully filter out the metrics based on what is asked for, here is the sample code. so I have an Idea of how to implement the checker now.
Great news!
I believe runners.go is the file responsible for the flow of these tests ? I am going through it right now.
Yes that's correct. The Runner struct in that file essentially manages the flow of the tests and takes care of installing cilium/tetragon and forwarding whatever ports we need to forward to get the gRPC checkers etc. working correctly.
Should I raise a draft PR so that every small piece can be reviewed and discussed ? @willfindlay
@prateek041 If you think you have enough concrete pieces, I'd be happy to take a look. Otherwise it's also fine to wait until you have a little more.
Are you still working on this @prateek041?
We already have an rpcchecker package to verify Tetragon events from our gRPC API. The next item on my wishlist is a metricschecker we could use to verify specific prometheus metrics during end-to-end tests. For example, a test could assert that we have a specific event count for a given pod or that we have no occurrences of a specific error.
To do this, we would need to add a new metricschecker package to tests/e2e and write some logic to parse and compare prometheus metrics to expected values. Then we just expose this as a features.Func just like we do for the rpcchecker.