confluentinc / ducktape

System integration and performance tests
16 stars 98 forks source link

Fail tests without non-empty @cluster annotation #336

Closed stan-is-hate closed 2 years ago

stan-is-hate commented 2 years ago

This might be breaking change if you rely on min_cluster_spec() method and also two changes in one.

1. Deprecation of min_cluster_spec() method Turns out we have two different mechanisms to state expected number of nodes for the test - there's the @cluster annotation, which we know and love/hate, and min_cluster_spec method on the test object itself. They are not related, and we mostly use @cluster annotation, except for a single place where we check min_cluster_spec and possibly fail - however, this check makes no sense, because it has already been checked against @cluster annotation, plus the test would fail if it tries to allocated too many nodes anyway.

So I'm deprecating min_cluster_spec method, removing all its two usages (the second one is for reporting, and it's also inaccurate). The method is still there, so if clients override it and refer to super() they should still be ok. They would probably be ok even if I remove it, since python is interpreted and if nothing calls that method, it's probably fine, but it's cleaner and more obvious this way.

2. Option to fail tests that don't explicitly specify their cluster requirements The real functionality of this PR is disallowing the tests that don't have @cluster annotation (or have them empty) and failing them immediately - since @cluster annotation is our preferred way of doing things right now, and since this protects us against tests that automatically use the whole cluster. This behavior is controlled by a new flag - --fail-greedy-tests, which enables this behavior when provided. We've discussed making this behavior automatic when using --parallel but ultimately felt that such behavior will be surprising for users, and not in a good way. We're still open to do this in the future, just not in this PR.

This PR also explicitly allows (and tests for) empty cluster spec in @cluster() annotation (e.g. @cluster(num_nodes=0)), thus allowing tests that don't use any cluster resources. This functionality was working fine before this PR, but didn't have unit tests, so we've made sure it still works after this PR.

imcdo commented 2 years ago

Important to note here that this WILL break people who do not use the cluster annotations tests. I would argue that this should be a flag and the default behavior should still be the same (as cluster annotation should really only be enforced when running with parallelism which isn't the default, w/o parallelism theres no harm no foul really)

lbradstreet commented 2 years ago

Important to note here that this WILL break people who do not use the cluster annotations tests. I would argue that this should be a flag and the default behavior should still be the same (as cluster annotation should really only be enforced when running with parallelism which isn't the default, w/o parallelism theres no harm no foul really)

It'll take some time to sort these issues out so I think having a flag makes sense.

stan-is-hate commented 2 years ago

I agree with not wanting to break everything, but I don't want to introduce a flag, purely because implementing and unit testing the flag will be a relative chore. I'd rather use semver power instead of the flag really - I'll first update existing kafka and CP tests to make sure they're fine, and then do another version bump (0.11.0) so that people can roll back to 0.10.0 if this breaks them - a single failed test run probably won't make or break things. This makes me think that the error message should be more explicit though - rather than 'cluster spec missing', I'll update it to say smth like 'cluster spec is missing. Please make sure all of your tests have @cluster annotation specified - it is now a requirement as of ducktape 0.11.0.'

stan-is-hate commented 2 years ago

@imcdo It's a different question about not enforcing this without parallelism though - I'll look into whether the behavior can be adjustable based on that. However, I think it's still a good idea for the test to declare it's node usage, since it enforces the test authors to be more diligent, and also I'm not sure it's a good idea to have the annotation required or not based on different execution modes.

stan-is-hate commented 2 years ago

@lbradstreet @imcdo even though this seems to be ready, I'm not going to merge it just yet - a quick search across kafka and confluent tests reveals that there are some tests that still use min_cluster_spec() method (which is removed in this version). I'll either update those tests, or lock ducktape minor version requirement for those repos before merging this.

stan-is-hate commented 2 years ago

Actually, I think a better thing to do would be to keep min_cluster_size method for compatibility, just hardcode it to return 0 and mark it as deprecated - this way we won't have to push a ton of PRs to multiple AK branches, they'll just continue to work. The only reason to even worry about having those methods in subclasses is because they usually call super(), which will fail if we remove it from superclass, so I'm adding it back.