[STAL-2472] feat: avoid processing minified JavaScript files

amaanq commented 1 month ago

What problem are you trying to solve?

Currently, we process minified JavaScript files during analysis. These files are very expensive to parse and query, and can raise false positives. Ideally, we should not be processing them to begin with, as these are not hand-written files by programmers.

What is your solution?

When we iterate over each language to process and fetch the files to be analyzed for the language, an additional step is added where we filter out minified files. The filtering process is done in the function filter_out_minified_files, and the heuristic for determining whether or not a file is minified is by checking if the average line length of the file is over 110. This is currently what GitHub does with linguist here. A test has been added where we write to a JavaScript file a few hundred characters on one line, and after filtering out minified files, we can see that the file was filtered out.

Alternatives considered

What the reviewer should know

amaanq commented 1 month ago

The CI failures are expected - we're no longer processing minified JS files in repos that we use to test for regressions in.

juli1 commented 1 month ago

Please fix the CI pipeline + ship

DataDog / datadog-static-analyzer