Open tpvasconcelos opened 1 week ago
I opened a PR for reference (https://github.com/bridgecrewio/checkov/pull/6738) that fixes the issue I had running the tests locally.
I doesn't address all the issues discussed above but I wanted to get started with a possible incremental solution.
Hey! New checkov user here 👋
How I noticed the problem
While trying to run checkov's test suit on my local machine I noticed that some files were being (unexpectedly) ignored. After a little digging, I realised that this is because I cloned the repo under a path that looks something like:
~/my_git_forks/checkov
.The reason this is a problem is because the following patterns are treated as regular expressions by checkov:
https://github.com/bridgecrewio/checkov/blob/42d3178c40cea47a9fc359e0edc4620153fbb446/checkov/secrets/utils.py#L10
and the
.git
pattern obviously excludes any paths that include the word 'git' (incl. at least one preceding character)The more fundamental issues
I think the more fundamental problem is that checkov documents
--skip-path
as including regexes, documentsCKV_IGNORED_DIRECTORIES
to include non-regex patterns (default values arenode_modules,.terraform,.serverless
instead ofnode_modules,\\.terraform,\\.serverless
), but then treats them equally in the code by merging them in some places, and explicitly treats them as both regexes and non-regex patterns in the same line of code.I think someone has noticed something similar in the past because I see a few
p.replace(".terraform", r"\.terraform")
sprinkled around the code base.However this just seems like another inconsistency because if users set
--skip-path="^.*\.terraform.*$"
, it will get converted internally to^.*\\.terraform.*$
, which would not ignore paths likepath/to/.terraform
.How to fix it?
Since checkov allows users to pass regexes to
--skip-path
and regardsCKV_IGNORED_DIRECTORIES
as legacy, it makes sense to consistently treat all patterns as regular expressions. However, since I just started looking at this project a couple of days ago, I have no idea what the consequences of doing this are...Some (naive?) suggestions
CKV_IGNORED_DIRECTORIES
:https://github.com/bridgecrewio/checkov/blob/42d3178c40cea47a9fc359e0edc4620153fbb446/checkov/common/runners/base_runner.py#L46
i.e.,:
Same needs to be done here:
https://github.com/bridgecrewio/checkov/blob/42d3178c40cea47a9fc359e0edc4620153fbb446/checkov/common/util/env_vars_config.py#L44
https://github.com/bridgecrewio/checkov/blob/42d3178c40cea47a9fc359e0edc4620153fbb446/checkov/secrets/utils.py#L10
p.replace(".terraform", r"\.terraform")
ignored_directories
,excluded_paths
, andskip_path
as regex patterns. This means removing all ~path in skip_path
invocations since this is not compatible with regex (e.g."\.hidden" not in ".hidden/secret.txt"
)path in skip_path
code being committed in the first place, theexcluded_paths
variable should probably be a set ofre.pattern
s instead of strings. e.g.:Other things to consider
Currently, patterns set in the form of
.hidden
will always exclude directories of the form.hidden-not
. However, this is a problem both in the non-regex ~path in skip_path
way of doing things and naive regex definitions. However, at least with regex the user can be explicit and say something like(^|.*\/)\.hidden\/?
However, this is not very user friendly and maybe there should be a way to allow users to define excluded directories in a
.hidden,also_hidden
flavour rather than the verbose (and non-Windown-compatible)--skip-path='(^|.*\/)\.hidden\/?' --skip-path='(^|.*\/)also_hidden\/?'