Re-using the existing Performance Journeys

We currently have a package called kbn-journeys here which does something similar. It's currently maintained by the Appex team.

We already have quite some user journeys defined here - https://github.com/elastic/kibana/tree/main/x-pack/performance/journeys

The QA team is currently running these journeys on a Bare Metal machines using FTR, 3 times a day reporting Telemetry to Overview cluster as APM instrumentation is enabled.

Since it uses FTR, it spins up a ES and Kibana instance and ingests data using es archives or synthtrace.

I believe what we are doing in this project is also defining user journeys which rather runs on beefy setup of Kibana and ES with much more data ingested via various different solutions.

Idea of this ticket is to open a discussion point preventing us from re-inventing the wheel.

@achyutjhunjhunwala thanks for bringing that up. I'd kick off from describing the goal of this project. What are we trying to achieve is to emulate end-user experience in different areas of Observability and measure the user journey performance in Serverless versus Stateful deployments. What we are not aiming to do here is to test Kibana performance in isolation (we rather evaluate the system as a whole) or to test whether certain UI element displays correctly. This project is not tied to any tooling that generate data, but as long as your project/deployment has observability data, whether real or synthetic, you can run this test suite against it by simply passing Kibana credentials to .env file. User journeys designed to put together as many "heavy" areas (such as pages with multiple visualizations) as possible and navigate the user through several pages of a particular Kibana section. Where it's possible, we log Elasticsearch query that aggregate data for visualization. We also run tests at a scale of multiple users (1-100) to put some load.

I've checked some user journeys here https://github.com/elastic/kibana/tree/main/x-pack/performance/journeys Looks like we approach creating it differently. For example, apm_service_inventory.ts, what it actually does is checks if Service Inventory / Transactions pages loaded and waits for Trace Waterfall on the page to load, while user journeys in this project do some things beyond that, here is a scenario for the same area:

Navigates to Observability > APM > Services.

Clicks on the service name with the highest error rate from the Inventory, then clicks "Service Details".

Filters data by selected date picker option. Waits for the content to load.

Opens the "Transactions" tab. Clicks on the most impactful transaction. Waits for the content to load.

Clicks on the "Failed transaction correlations" tab. Waits for the content to load.

Sorts the result by field value. Filters the result by a particular field value by clicking on the "+".

[Only in Stateful] Clicks on "Investigate", selects "Host logs".

[Only in Stateful] Filters logs by last 24 hours, then filters by error messages.

[Only in Stateful] Expands certain document. Waits for the content to load.

@cachedout @adam-stokes FYI

How i look at both these projects is they complement each other. kbn-journeys as you right pointed tests Kibana/ES itself in isolation using the FTR runner by spinning it on a Bare Metal Machine whereas this project could point to any instance and run tests.

What are we trying to achieve is to emulate end-user experience in different areas of Observability and measure the user journey performance in Serverless versus Stateful deployments.

I believe both the projects actually have the same goal. The purpose with which kbn-journey was built was to identify regressions between releases and once Serverless was to release, do the same on serverless. @dmlemeshko can add more here.

Regarding the journey comparison, i agree the journey which i wrote for apm_service_inventory was very basic. Infact i want to copy paste the journey you created for the APM and use it as its even better.

But i am not looking at this from Journey perspective solely. The question which i am asking myself (and others will too) is -

As a developer from the solutions team, if i need to measure performance of my application and identify regressions, which approach should i take - kbn/journeys or oblt/playwright.
Can we somehow merge the 2 solutions find a way to use a common journey approach where developers can write journeys for their solutions and that could be run either with FTR or with oblt/playwright. It could be this is not possible, but i opened this ticket with the idea to discuss possibilities.

cc @ruflin

As a developer from the solutions team, if i need to measure performance of my application and identify regressions, which approach should i take - kbn/journeys or oblt/playwright.

I see it this way. If you are interested in evaluating solely Kibana performance, proceed with kbn/journeys. If you are interested in evaluating all the stack components with real-time data ingestion and querying big data from Elasticsearch, use oblt/playwright.

Can we somehow merge the 2 solutions find a way to use a common journey approach where developers can write journeys for their solutions and that could be run either with FTR or with oblt/playwright. It could be this is not possible, but i opened this ticket with the idea to discuss possibilities.

Playwright test scripts can in theory be reused, but it will require bringing the coding style of both projects to one. If you want to reuse some stuff from this repo to test Kibana performance and catch regression bugs, you'll need to redo many assertions, get rid of parts that not aren't stressing Kibana specifically, redo authentication approach, recreate all the env variables, etc. Basically, it means to recreate all from scratch.

Let's wait for what others think about that.

I'd like to add some comments here as well. First, I think these are excellent points raised by @achyutjhunjhunwala . I'm especially compelled by the argument that we are setting ourselves up for a world in which is hard for developers to reason about where they should put new journeys and that duplicating them across various systems may be undesirable.

That said, I think there are a few cases covered by oblt-playwright (and the associated tooling) which cover gaps that are currently in FTR. @ablnk has already touched on these but I'll restate them.

Right now, FTR only understands how to generate data using load which can originate from the Kibana codebase itself. This is synthtrace in your example. This is problematic in two ways. The first is that sythtrace is not designed to put the cluster itself under any significant load, nor does it generate extremely large data sets. As such, we believe that it may be very difficult to detect regressions because with small data sets, the response time might not vary enough to separate it from normal variation between runs -- even when run on bare-metal hardware. The second is that we can't currently generate data for use outside of APM.

I'd restate Andrei's conclusions as follows:

FTR is appropriate for regression testing in low-load scenarios. It is designed specifically to test regressions in Kibana but may not expose regressions or performance variations in Elasticsearch. It is neither a benchmarking tool nor does it attempt to make claims about the performance of the stack under either ingestion load or query load. It also does not attempt to determine performance when responses from ES are very large.
oblt-playwright is not suitable for regression testing in Kibana. However, it is useful when attempting to measure whole-stack performance between different types of deployments (such as stateless ES in serverless versus ESS). It is appropriate for stress-testing the user experience for clusters which are placed under significant load.

I hold the view that both test approaches are complementary to each other and that we would be at a significant disadvantage if we were not to have both.

That said -- I do think that a write-once-run-anywhere world is a desirable one. I do think it's a good idea to see what it would take to get oblt-playwright to consume journeys from FTR. Perhaps there might be a way in FTR that we could annotate tests which oblt-playwright could consume.

elastic / oblt-playwright

Re-using the existing Performance Journeys #7