bazingagin / npc_gzip

Code for Paper: “Low-Resource” Text Classification: A Parameter-Free Classification Method with Compressors
MIT License
1.77k stars 155 forks source link

Enable CodeQL #41

Closed EliahKagan closed 1 year ago

EliahKagan commented 1 year ago

This adds a CodeQL CI workflow, to complement the automated tests and style checkers. It runs the default/recommended CodeQL queries for Python code.

It uses the default triggers for CodeQL: pushes to the main branch, PRs targeting the main branch, and once weekly. It can be broadened to run on all branches even if not on main or a PR targeting main, but CodeQL tends to be slower than most workflows so I think the existing defaults make sense. If it turns out to be fast, we may want to change it later to run on all branches. The reason it also runs on a weekly schedule is that new CodeQL queries are added regularly (which is not the case for most other tools).

The CodeQL developers believe they have improved CodeQL to the point that it does not need to be able to examine the code of Python dependencies to reliably detect bugs. I have customized this workflow not to install dependencies. This saves the time of installing them and also of CodeQL traversing into and examining them. That is the only way I have customized it.

This customization might not be needed. If you, as the repository owner, have not used CodeQL on any of your repositories prior to that announcement, I believe it will not install dependencies unless explicitly configured to do so. That is, I believe the default behavior is controlled by factors outside the workflow file. However, the explicit setup-python-dependencies: false may still be valuable, because it may make CodeQL faster if it is run in forks (whose owners may have been using CodeQL in other repositories before the change).

Even if you decide to enable CodeQL, you don't have to accept this pull request! There are other ways to set it up that you might prefer:

I will not mind at all if you do one of those things and close this pull request instead of accepting it--the main goal of this PR is to propose the use of CodeQL. I also understand you may decide not to use CodeQL at all.

github-advanced-security[bot] commented 1 year ago

This pull request sets up GitHub code scanning for this repository. Once the scans have completed and the checks have passed, the analysis results for this pull request branch will appear on this overview. Once you merge this pull request, the 'Security' tab will show more code scanning analysis results (for example, for the default branch). Depending on your configuration and choice of analysis tool, future pull requests will be annotated with code scanning analysis results. For more information about GitHub code scanning, check out the documentation.

EliahKagan commented 1 year ago

Another recent improvement to CodeQL tips the balance toward using a "Default" CodeQL configuration, so I'm closing this. I still recommend enabling CodeQL for this repository, but I recommend using the "Default" configuration, rather than adding a workflow file.

Specifically, the "Default" configuration was improved yesterday so that it also performs a weekly scheduled scan, as detailed in Code scanning default setup now analyzes on a weekly schedule.

Although that was not my original main motivation for proposing the creation of a workflow file, inspecting the output of the first commit, d1144f5, from before I manually disabled installing Python dependencies, reveals that it was already skipping installing them. I am inclined to think that the "Default" configuration will follow the same rule, since you probably have not enabled CodeQL in any GitHub repositories before the default was changed.

Although I now recommend enabling CodeQL with the "Default" configuration, I can reopen this PR if requested.