ansible / ansible-lint

ansible-lint checks playbooks for practices and behavior that could potentially be improved and can fix some of the most common ones for you
https://ansible.readthedocs.io/projects/lint/
GNU General Public License v3.0
3.44k stars 654 forks source link

Missing Rule breaks due to hash collisions for dataclass MatchError #4297

Open marcandre-larochelle-bell opened 3 weeks ago

marcandre-larochelle-bell commented 3 weeks ago
Summary

Rule breaks with the same exact message and line number, but different filename are missing due to a hash collision.

Issue Type
OS / ENVIRONMENT

ansible-lint version 24.7.0

STEPS TO REPRODUCE

The MatchError data class has the unsafe_hash=True, which generates automatically a hash function based on the members of the class, however the field filename is non-existent (not defined) in the dataclass, it is added at a later stage (see: var_naming.py#L204) and then all the errors are added to a set (see: runner.py#L682). Due to the filename member being non-existent at the class definition, the hash function does not take into account the filename and cause collisions if you have the same exact rule break in another file at the same line number.

  1. Create 2 different files with var naming issues on the same lines number (but in the 2 sperate file)
  2. Run ansible-lint
  3. There should be only 1 issue raised due to the bug.
Desired Behavior

The same rule break happening in a different file on the same line number should be recorded properly as different rule break.

Actual Behavior

Simplified code example of the reproducible issue we are experiencing:

from dataclasses import dataclass

@dataclass(unsafe_hash=True)
class MatchError(ValueError):
    message: str
    tag: str
    lineno: int = 1

    @property
    def _hash_key(self) -> any:
        # line attr is knowingly excluded, as dict is not hashable
        return (
            self.filename,
            self.lineno,
            self.message,
            self.tag
        )

    def __eq__(self, other):
        return self.__hash__() == other.__hash__()

error1 = CustomError(message="An error occurred", tag="ERROR_TAG", lineno=42)
error1.filename = "test.py"
error2 = CustomError(message="An error occurred", tag="ERROR_TAG", lineno=42)
error2.filename = "test2.py"
error3 = CustomError(message="An error occurred", tag="ERROR_TAG", lineno=43)
error3.filename = "test2.py"

print(error1.filename)
print(error2.filename)
print(error3.filename)
print(hash(error1))
print(hash(error2))
print(hash(error3))

Potential fix:

cavcrosby commented 3 weeks ago

filename was a field for the MatchError dataclass at one point and only was recently removed from it when looking at https://github.com/ansible/ansible-lint/pull/4202. I was also able to reproduce this on the main branch by running the following below.

mkdir -p "./vars/test"

cat << _EOF_ > "./vars/test/foo.yml"
---
CamelCaseIsBad: foo
ALL_CAPS_ARE_BAD_TOO: foo
_EOF_

cat << _EOF_ > "./vars/test/bar.yml"
---
CamelCaseIsBad: bar
ALL_CAPS_ARE_BAD_TOO: bar
_EOF_

ansible-lint --offline "./vars/test"

That said, when running the following in addition, this issue seems to disappear.

cat << _EOF_ > "./vars/test/bar.yml"
---
AnotherCamelCaseIsBad: bar
ANOTHER_ALL_CAPS_ARE_BAD_TOO: bar
_EOF_

ansible-lint --offline "./vars/test"
marcandre-larochelle-bell commented 3 weeks ago

@cavcrosby for the 2nd case you highlighted, it has to be exactly the same rule "description" being broken (which in the case of var naming requires to have the same variable name), otherwise the hash calculated will be different (the only thing not taking into account for the hash right now that we noticed was the filename, which causes the conflict between hashes if there are 2 rules broken with the same description, at the same line number, but in different files).

cavcrosby commented 2 weeks ago

Ahh, I see what you mean. I'll look into opening a PR for this sometime soon.