Open alexanderjeurissen opened 3 years ago
it takes on average ~20-30 seconds to complete tagging
That doesn't sound overly onerous unless you're running the tagger all the time. Which you aren't meant to be doing.
Ideally generating a unique identifier would not require processing all files, but rather be derived from a given scenario itself. For instance by creating a hex digest from a unique combination of scenario attributes.
With incremental IDs Me: "Hey, co-worker, which test was it that failed?" Co-worker: "Test '3752'." Me: "Thanks!"
With hex IDs Me: "Hey, co-worker, which test was it that failed?" Co-worker: "Test 'C8EDEA22AF2F8BA51438221440D47DFA'." Me: "..." Me: "You know what, just email it to me."
So, while I wouldn't object to offering an additional type of tagging scheme or allowing for entirely arbitrary ones, the basic numbering option probably is probably going to stick around as the default. Sure, a tagging solution that was faster would save time, but human time is more valuable than computer time and numbers are more human time friendly.
Regarding your merge conflict problem, do the tests really need cataloging immediately when they are written or even while they are still on whatever temporary side branch a developer is working on? If cataloging can wait until the branch is merged into the main development branch then cataloging could be an automatic post-merge action in Git or your CI.
Alternatively, you could store the current maximum ID number (in your database or whatever part of your development infrastructure is convenient) and then have your cataloging script grab that stored value and give it to the cataloger as an index starting point (via the explicit_indexes
parameter. The script would also update the saved index to the new highest ID once the cataloging was done. This approach would work both as an 'on demand' scrip that a developer could run locally or as a CI job or Git hook. Personally, I recommend having it be a CI job/hook because there is still the chance of ID overlap if more than once instance of the script was running at the same time.
cuke_cataloger
works great but there are a couple of shortcomings that make it hard to use in bigger repositories or in repositories with a lot of contributors / feature branches.Problem 1: Incremental ID requires processing all existing tags
Currently the
cuke_cataloger:tag_tests
rake task is very slow, on average in a codebase that I maintain where we have ~3000 tagged scenarios it takes on average ~20-30 seconds to complete tagging.One of the causes of this slow execution time is the need to scan through all feature_files / scenarios to:
untagged
scenariosmax
test_case id.Ideally generating a unique identifier would not require processing all files, but rather be derived from a given scenario itself. For instance by creating a hex digest from a unique combination of scenario attributes.
This would allow for only processing or tagging specific files (for instance in a git hook only tag changed feature files) which significantly improves performance.
As I want to enforce engineers to tag their new tests, the ability for this to be executed without much overhead as part of a git hook would be a big gain.
Problem 2: Incremental ID very prone to duplicates on merge
Consider the following branches:
Both branches contain a new feature file with a single scenario. Running
bundle exec rake cuke_cataloger:tag_tests
in each respective feature_branch will tag the new scenario withtest_case_1
given thatcuke_cataloger
is not aware of other tags in other branches.As a result when merging both feature branches to
main
we will end up with two scenarios with the same tag.In a project where I recently introduced
cuke_cataloger
this poses a real problem. We runcuke_cataloger:validate_tests
as a build step on our CI to ensure the uniqueness of thetest_case_x
tags as we rely on the uniqueness to store structured logging. In the scenario above this will result in build failures on themain
branch which is undesirable as it blocks other engineers from kicking off a code deploy.Solution
Given the above two problems, currently I cant rely on
cuke_cataloger
and wrote a custom script that incorporates parts of cuke_cataloger combined withcuke_slicer
:It does not support all use-cases that
cuke_cataloger
supports for instance it does not includesub-id
but I think it gets the idea across. The above script only takes~6 seconds
on ~3000 scenarios instead of the aforementioned ~20 seconds.It would be great if we could incorporate parts of this script upstream as I'd rather use
cuke_cataloger
then having yet another custom script to maintain ;-)