Currently, we process minified JavaScript files during analysis. These files are very expensive to parse and query, and can raise false positives. Ideally, we should not be processing them to begin with, as these are not hand-written files by programmers.
What is your solution?
When we iterate over each language to process and fetch the files to be analyzed for the language, an additional step is added where we filter out minified files. The filtering process is done in the function filter_out_minified_files, and the heuristic for determining whether or not a file is minified is by checking if the average line length of the file is over 110. This is currently what GitHub does with linguisthere. A test has been added where we write to a JavaScript file a few hundred characters on one line, and after filtering out minified files, we can see that the file was filtered out.
What problem are you trying to solve?
Currently, we process minified JavaScript files during analysis. These files are very expensive to parse and query, and can raise false positives. Ideally, we should not be processing them to begin with, as these are not hand-written files by programmers.
What is your solution?
When we iterate over each language to process and fetch the files to be analyzed for the language, an additional step is added where we filter out minified files. The filtering process is done in the function
filter_out_minified_files
, and the heuristic for determining whether or not a file is minified is by checking if the average line length of the file is over 110. This is currently what GitHub does withlinguist
here. A test has been added where we write to a JavaScript file a few hundred characters on one line, and after filtering out minified files, we can see that the file was filtered out.Alternatives considered
What the reviewer should know