KnapsackPro / knapsack-pro-js

Run your 1-hour test suite in 2 minutes with optimal parallelisation on your existing CI infrastructure
https://knapsackpro.com
MIT License
3 stars 0 forks source link

Batches not optimized after using KNAPSACK_PRO_TEST_FILE_LIST_SOURCE_FILE #24

Open pbomb opened 1 year ago

pbomb commented 1 year ago

We've noticed a slowdown in our Jest tests after upgrading to @knapsack-pro/jest v5.4.1 and using the KNAPSACK_PRO_TEST_FILE_LIST_SOURCE_FILE option. What we're seeing is that the execution behavior of batches is very different and not optimized, leading to our Jest steps taking about 9 minutes (up from about 6 minutes). I've tried downgrading just the @knapsack-pro/core library from v4.0.0 to both v3.3.1 and v3.3.0, but did not see any change in behavior. However, reverting back from using KNAPSACK_PRO_TEST_FILE_LIST_SOURCE_FILE to KNAPSACK_PRO_TEST_FILE_PATTERN is what resolves the problem.

Here's a comparison of the two:

KNAPSACK_PRO_TEST_FILE_PATTERN With this approach, you can see that the batches start with the slowest ones and get faster, leading to better durations when parallelized.

image

KNAPSACK_PRO_TEST_FILE_LIST_SOURCE_FILE With this approach, you can see that the batches are not organized and as a result lead to many more batches being executed and durations taking much longer.

image

I tried to look through the changes, but didn't find an explanation for why the batching behavior would be different between the two approaches so I'm creating this issue in the hopes that the context you have can provide an answer.

ArturT commented 1 year ago

If you provide a list of test files to KNAPSACK_PRO_TEST_FILE_LIST_SOURCE_FILE that was never recorded, then the test files' execution time needs to be recorded for the first time.

For instance, when you provide a full path for the test files instead of a relative path. New paths have to be recorded for the first time (which means a lot of small batches).

You can run a few CI builds, and once the new paths are recorded, they should be grouped in smaller batches in future CI builds.

If the problem persists, please share the name of the branch you are testing it and share the link to CI builds in the user dashboard with a support: https://knapsackpro.com/contact I can review it then.

pbomb commented 1 year ago

Hi @ArturT. Thanks for the response. I did some more investigation and found some interesting results. First, I did notice that our use of jest --listTests to get the file list does end up using the absolute file paths and not relative ones. However, even after running hundreds of builds like this over 2 weeks, we were still seeing unbalanced runs like the screenshot I posted above. I updated our script to transform the absolute file names to relative paths and that immediately resulted in correctly-balanced batches. I am filing a report with support now.

ArturT commented 1 year ago

First, I did notice that our use of jest --listTests to get the file list does end up using the absolute file paths and not relative ones. However, even after running hundreds of builds like this over 2 weeks, we were still seeing unbalanced runs like the screenshot I posted above.

It could happen when you did not record all test files with absolute file paths. You could be running only a subset of the test suite for 2 weeks. For example, you generate a small list of test files (KNAPSACK_PRO_TEST_FILE_LIST_SOURCE_FILE) to run in a given CI build so that you run only tests that are related to the code changes.

Not all work done by developers over 2 weeks touched the whole code base, so not all test files with absolute paths have been recorded yet. As a result of that, your CI build would be unbalanced because you often provide a new unknown absolute file path to KNAPSACK_PRO_TEST_FILE_LIST_SOURCE_FILE that needs to be recorded for the very first time.

I updated our script to transform the absolute file names to relative paths and that immediately resulted in correctly-balanced batches. I am filing a report with support now.

Great.

I guess you were running all test files on a main branch (without KNAPSACK_PRO_TEST_FILE_LIST_SOURCE_FILE), so that could result in relative paths being recorded. And absolute paths being only recorded partially - which caused the problem.

I recommend using relative paths always.

pbomb commented 1 year ago

Let me give you some more details about our history as some of what you're saying doesn't make sense to me. We were originally using the KNAPSACK_PRO_TEST_FILE_PATTERN option which I believe ends up using relative file paths and when KNAPSACK_PRO_TEST_FILE_LIST_SOURCE_FILE became available, we switched to it using jest --listTests which generates absolute file paths. I would expect the first couple of runs with this new method to not be balanced since the file paths look new, but I don't understand why it never balanced itself over the hundreds of builds over the next 2 weeks unless there is a bug in Knapsack Pro that does not handle being sent absolute file names.

Is there a separate process for "recording" or does that automatically happen as a given test file is executed?

ArturT commented 1 year ago

We were originally using the KNAPSACK_PRO_TEST_FILE_PATTERN option which I believe ends up using relative file paths

That's correct. The KNAPSACK_PRO_TEST_FILE_PATTERN uses relative test file paths.

KNAPSACK_PRO_TEST_FILE_LIST_SOURCE_FILE became available, we switched to it using jest --listTests which generates absolute file paths.

Ok. So you were still running all test files.

I incorrectly assumed you needed KNAPSACK_PRO_TEST_FILE_LIST_SOURCE_FILE to run only a subset of tests (which was often the reason people asked for this feature in the past).

I would expect the first couple of runs with this new method to not be balanced since the file paths look new, but I don't understand why it never balanced itself over the hundreds of builds over the next 2 weeks unless there is a bug in Knapsack Pro that does not handle being sent absolute file names.

Good point. Maybe the absolute paths were not sent to API. I suspect the Queue API returned absolute paths without execution time attribute for the problematic CI builds.

Is there a separate process for "recording" or does that automatically happen as a given test file is executed?

"recording" is a mental shortcut for tests' execution time being saved to API. This happens after all tests are executed on the CI node.

All ever-recorded tests can be seen in the Statistics of test files history tab in the user dashboard. But I don't see absolute paths there or for CI builds you shared over the email for problematic CI builds.


I managed to reproduce the bug. @knapsack-pro/jest always sends relative paths to API after tests are executed (https://github.com/KnapsackPro/knapsack-pro-jest/blob/a4328d2b56582dca7722b8649c1a44a0bf80728f/src/knapsack-pro-jest.ts#L69,L72. This was done on purpose because when relative paths are provided to Jest it returns absolute paths).

When you started a new CI build you provided absolute test file paths to initialize a new Queue on the API side. API does not know the absolute paths' execution time, which caused unbalanced tests.

We would have to convert absolute paths to relative paths as done here but on @knapsack-pro/core level where is KNAPSACK_PRO_TEST_FILE_LIST_SOURCE_FILE or explicitly say in docs that you need to provide relative paths to KNAPSACK_PRO_TEST_FILE_LIST_SOURCE_FILE.

ArturT commented 1 year ago

We do not allow saving absolute test file paths to API because this could lead to problems when tests are executed on different machines.

For example, you could have a path like /home/user123/project/__tests__/a.test.js but some CI providers may generate a unique user ID for a CI node so you would end up with a different path for each CI build.

Also, if you would like to reproduce something locally and run tests on your local machine, than the absolute test file path will be different than on CI.

That is why it's safe always to use relative paths to your project directory.

pbomb commented 1 year ago

Thanks for the explanation @ArturT. We're now transforming the file paths from jest --listTests from absolute to relative paths and everything looks good!

ArturT commented 1 year ago

Great.

I added this issue to our backlog.


EDITED:

Docs have been updated for KNAPSACK_PRO_TEST_FILE_LIST_SOURCE_FILE https://docs.knapsackpro.com/jest/reference/#knapsack_pro_test_file_list_source_file

ArturT commented 1 year ago

story

https://trello.com/c/TqkNjpa6

Here is an idea on what we could do to make @knapsack-pro/core more bulletproof.

We already say in docs that you should use only relative paths for test files defined with KNAPSACK_PRO_TEST_FILE_LIST_SOURCE_FILE https://docs.knapsackpro.com/jest/reference/#knapsack_pro_test_file_list_source_file

If someone provides absolute paths in the file defined with KNAPSACK_PRO_TEST_FILE_LIST_SOURCE_FILE then we could remove the project path prefix from the test file paths before they are sent to Knapsack Pro API.

The project path can be determined with

const projectPath = process.cwd();

https://github.com/KnapsackPro/knapsack-pro-jest/blob/a4328d2b56582dca7722b8649c1a44a0bf80728f/src/knapsack-pro-jest.ts#LL31C29-L31C29

We already do removing of project path prefix from test files executed by Jest in @knapsack-pro/jest https://github.com/KnapsackPro/knapsack-pro-jest/blob/a4328d2b56582dca7722b8649c1a44a0bf80728f/src/knapsack-pro-jest.ts#L70 but we need it on @knapsack-pro/core level where the KNAPSACK_PRO_TEST_FILE_LIST_SOURCE_FILE feature is. @knapsack-pro/core is responsible for initializing a queue on the API side with test file paths defined with KNAPSACK_PRO_TEST_FILE_LIST_SOURCE_FILE.