bridgecrewio / checkov

Prevent cloud misconfigurations and find vulnerabilities during build-time in infrastructure as code, container images and open source packages with Checkov by Bridgecrew.
https://www.checkov.io/
Apache License 2.0
7.04k stars 1.11k forks source link

Some inconsistencies between `CKV_IGNORED_DIRECTORIES` and `--skip-path` #6737

Open tpvasconcelos opened 1 week ago

tpvasconcelos commented 1 week ago

Hey! New checkov user here 👋

How I noticed the problem

While trying to run checkov's test suit on my local machine I noticed that some files were being (unexpectedly) ignored. After a little digging, I realised that this is because I cloned the repo under a path that looks something like: ~/my_git_forks/checkov.

The reason this is a problem is because the following patterns are treated as regular expressions by checkov:

https://github.com/bridgecrewio/checkov/blob/42d3178c40cea47a9fc359e0edc4620153fbb446/checkov/secrets/utils.py#L10

and the .git pattern obviously excludes any paths that include the word 'git' (incl. at least one preceding character)

The more fundamental issues

I think the more fundamental problem is that checkov documents --skip-path as including regexes, documents CKV_IGNORED_DIRECTORIES to include non-regex patterns (default values arenode_modules,.terraform,.serverless instead of node_modules,\\.terraform,\\.serverless), but then treats them equally in the code by merging them in some places, and explicitly treats them as both regexes and non-regex patterns in the same line of code.

I think someone has noticed something similar in the past because I see a few p.replace(".terraform", r"\.terraform") sprinkled around the code base.

However this just seems like another inconsistency because if users set --skip-path="^.*\.terraform.*$", it will get converted internally to ^.*\\.terraform.*$, which would not ignore paths like path/to/.terraform.

How to fix it?

Since checkov allows users to pass regexes to --skip-path and regards CKV_IGNORED_DIRECTORIES as legacy, it makes sense to consistently treat all patterns as regular expressions. However, since I just started looking at this project a couple of days ago, I have no idea what the consequences of doing this are...

Some (naive?) suggestions

  1. Escape the default values for CKV_IGNORED_DIRECTORIES:

https://github.com/bridgecrewio/checkov/blob/42d3178c40cea47a9fc359e0edc4620153fbb446/checkov/common/runners/base_runner.py#L46

i.e.,:

-IGNORED_DIRECTORIES_ENV = os.getenv("CKV_IGNORED_DIRECTORIES", "node_modules,.terraform,.serverless")
+IGNORED_DIRECTORIES_ENV = os.getenv("CKV_IGNORED_DIRECTORIES", "node_modules,\.terraform,\.serverless")

Same needs to be done here:

https://github.com/bridgecrewio/checkov/blob/42d3178c40cea47a9fc359e0edc4620153fbb446/checkov/common/util/env_vars_config.py#L44

  1. Same here:

https://github.com/bridgecrewio/checkov/blob/42d3178c40cea47a9fc359e0edc4620153fbb446/checkov/secrets/utils.py#L10

-EXCLUDED_PATHS = [*ignored_directories, DEFAULT_EXTERNAL_MODULES_DIR, ".idea", ".git", "venv"]
+EXCLUDED_PATHS = [*ignored_directories, re.escape(DEFAULT_EXTERNAL_MODULES_DIR), "\.idea", "\.git", "venv"]
  1. Remove any hard-coded references to p.replace(".terraform", r"\.terraform")
  2. Consistently threat all references to ignored_directories, excluded_paths, and skip_path as regex patterns. This means removing all ~path in skip_path invocations since this is not compatible with regex (e.g. "\.hidden" not in ".hidden/secret.txt")
    • As a safeguard and to avoid ~path in skip_path code being committed in the first place, the excluded_paths variable should probably be a set of re.patterns instead of strings. e.g.:
      excluded_paths = {re.compile(p) for p in self.config.skip_path}`

      Other things to consider

Currently, patterns set in the form of .hidden will always exclude directories of the form .hidden-not. However, this is a problem both in the non-regex ~path in skip_path way of doing things and naive regex definitions. However, at least with regex the user can be explicit and say something like (^|.*\/)\.hidden\/?

Screenshot 2024-09-28 at 03 19 08 Screenshot 2024-09-28 at 03 20 02 Screenshot 2024-09-28 at 03 20 57

However, this is not very user friendly and maybe there should be a way to allow users to define excluded directories in a .hidden,also_hidden flavour rather than the verbose (and non-Windown-compatible) --skip-path='(^|.*\/)\.hidden\/?' --skip-path='(^|.*\/)also_hidden\/?'

tpvasconcelos commented 1 week ago

I opened a PR for reference (https://github.com/bridgecrewio/checkov/pull/6738) that fixes the issue I had running the tests locally.

I doesn't address all the issues discussed above but I wanted to get started with a possible incremental solution.