Infleqtion / client-superstaq

https://superstaq.readthedocs.io
Apache License 2.0
84 stars 19 forks source link

License header formatting #983

Open natibek opened 2 months ago

natibek commented 2 months ago

There are inconsistencies in when license headers are added to source code (and whether the added ones are the same). This PR solves this with a checker that checks if the source code has a license header and if it does, whether it is the correct one. It accounts for shebang lines, comments at the beginning of files unrelated to license headers, and pylint and mypy disable lines. This check has been added to all_.py.

natibek commented 1 month ago
  • once we correct outdated headers once, it seems like we shouldn't need to keep checking them? in which case maybe the "outdated" functionality doesn't need to live in in this script - we can save the code you use to make these initial corrections somewhere internally, and then use this script to check headers from here on out

That makes sense. We can also keep it but change the logic a bit. After the initial fix, instead of checking for ColdQuanta in the license header, we can check if it belongs to the licensee but is a different license. This can catch cases of changing the license provider.

  • similarly, i can maybe see why it's unavoidable but i feel like the hard-coded "apache" checks somewhat defeat the purposes of saving the header in the config. do you think there's an easy way to check if the headers are ~the same, up to licensee/year? maybe we could allow the header in pyproject.toml to include {YEAR} and {LICENSEE} tags, which we could convert to wildcards when comparing against existing licenses

I added a few more fields to replace the hard-coded variables. The cirq license header check pylint plugin does something similar. However, apache 2.0 licenses seem to have 2 different formattings from what I have seen in the license headers and that would mess with the matching if we use the wild card approach.

natibek commented 1 month ago

@richrines1 can you please take a look? The biggest change is that I am using difflib.SequenceMatcher to check if the body of the license header matches the header specified in the pyproject.toml file. Instead of checking if the license name (eg Apache) is in the header, we check if the header body matches the provided header body. If it matches above a threshold and the licensee is not included in the copyright line and the header is editable, the licensee is appended to the header.