chainguard-dev / malcontent

#supply #chain #attack #detection
Apache License 2.0
446 stars 31 forks source link

Display scan results as soon as results are generated #617

Closed egibs closed 5 days ago

egibs commented 6 days ago

Closes: #489

Up until now (post-concurrency changes), we've been rendering results in a deterministic way by storing all of the file paths in a slice, sorting the slice, and then iterating through a map using the sorted keys. This prevents any output feedback while a scan is actually occurring and is especially apparent when scanning directories with many files.

This PR re-adds real-time streaming of results which will not be deterministic (which we probably shouldn't care about in general), but I added explicit sorting to two of our tests to ensure that the result data is what we expected. An unintended improvement of this PR is that --err-first-hit and --err-first-miss now work like they're supposed to.

I also verified that --err-first-hit and --err-first-miss also work with this implementation:

$ for i in (seq 1 3); go run cmd/mal/mal.go --err-first-hit analyze /usr/bin/; end
🔎 Scanning "/usr/bin/"
👋 "/usr/bin/SafeEjectGPU": matched requested condition
🔎 Scanning "/usr/bin/"
👋 "/usr/bin/SafeEjectGPU": matched requested condition
🔎 Scanning "/usr/bin/"
👋 "/usr/bin/SafeEjectGPU": matched requested conditione /Library/Application\ Support/BTServer/; end
$ for i in (seq 1 3); go run cmd/mal/mal.go --err-first-miss analyze /Library/Application\ Support/BTServer/; end
🔎 Scanning "/Library/Application Support/BTServer/"
👋 "/Library/Application Support/BTServer/pincode_defaults.db": matched requested condition
🔎 Scanning "/Library/Application Support/BTServer/"
👋 "/Library/Application Support/BTServer/pincode_defaults.db": matched requested condition
🔎 Scanning "/Library/Application Support/BTServer/"
👋 "/Library/Application Support/BTServer/pincode_defaults.db": matched requested condition

I also fixed up a data race condition, simplified how file reports are stored, and improved the output of --stats to make more sense. As part of the files map simplification, I added code to the JSON and YAML renderers to convert the sync.Map data to a format that works with the respective Marshal functions.

egibs commented 6 days ago

Converting this to a draft to refactor the refactor to use standard Golang constructs since we aren't concerned with output determinism. I also found a data race in diff.go which needs to be fixed.