DataDog / datadog-static-analyzer

Datadog Static Analyzer
https://docs.datadoghq.com/static_analysis/
Apache License 2.0
100 stars 12 forks source link

[STAL-2472] feat: avoid processing minified JavaScript files #503

Closed amaanq closed 1 month ago

amaanq commented 1 month ago

What problem are you trying to solve?

Currently, we process minified JavaScript files during analysis. These files are very expensive to parse and query, and can raise false positives. Ideally, we should not be processing them to begin with, as these are not hand-written files by programmers.

What is your solution?

When we iterate over each language to process and fetch the files to be analyzed for the language, an additional step is added where we filter out minified files. The filtering process is done in the function filter_out_minified_files, and the heuristic for determining whether or not a file is minified is by checking if the average line length of the file is over 110. This is currently what GitHub does with linguist here. A test has been added where we write to a JavaScript file a few hundred characters on one line, and after filtering out minified files, we can see that the file was filtered out.

Alternatives considered

What the reviewer should know

amaanq commented 1 month ago

The CI failures are expected - we're no longer processing minified JS files in repos that we use to test for regressions in.

juli1 commented 1 month ago

Please fix the CI pipeline + ship