Upgrade the packaging setup

webknjaz commented 3 years ago

I see that currently setup.py is being used and called directly. This is highly discouraged and is deprecated. See https://blog.ganssle.io/articles/2021/10/setup-py-deprecated.html for more details.

It's quite easy to upgrade it to use PEP 517 with invocations via pypa/pip and pypa/build.

If you're open to improvements, I could try to find some time to contribute some packaging and CI/CD-related PRs.

abhinavsingh commented 3 years ago

@webknjaz Will highly appreciate PR for these, I could learn new things :). I am stuck in classic vanilla setup.py workflows and seemingly things have evolved a lot in this aspect. Thank you 👍

webknjaz commented 3 years ago

Woah! Among other problems, there are no wheels uploaded for the last versions on PyPI (since 2019!)

webknjaz commented 3 years ago

First improvement done: https://github.com/abhinavsingh/proxy.py/pull/647.

webknjaz commented 3 years ago

@abhinavsingh I've submitted a number of the linting setup maintainability improvements. The last one includes everything but I thought it would probably be easier/more digestible if the building blocks were their own atomic PRs:

This setup is rather strict but has a lot of excludes in the config because there's a lot of the linting violations that get revealed otherwise. I think it'd make a nice series of "good first issues" for beginner contributors to improve.

webknjaz commented 3 years ago

+1 on top https://github.com/abhinavsingh/proxy.py/pull/658

webknjaz commented 3 years ago

@abhinavsingh I wanted to ask you if you have a strong preference to have multiple testing workflows. It feels easier to have the testing jobs in the same workflow because then it's possible to create useful dependencies between them.

Another suggestion would be to make the workflow and job names shorter. Ever since GHA got introduced many projects noticed that big job names don't fit in the UI in multiple places on the website making them rather useless. From this, practice to keep them short emerged. I hope you don't mind if I try to come up with better names. gha-cut-checks gha-sidebar-cut-checks

webknjaz commented 3 years ago

Uncommented another fixer on top: https://github.com/abhinavsingh/proxy.py/pull/661

webknjaz commented 3 years ago

And here's two more improvements:

I'm starting to hit merge conflicts so I'll pause for some time to give you a chance to get these merged before I add more.

abhinavsingh commented 3 years ago

@webknjaz Yes naming shouldn't be an issue. My original intention here was to surface enough context via job names. Checking on the PRs.

webknjaz commented 3 years ago

Yeah I see that. But currently you repeat the same context in many places. Job names show up right after the workflow names in all of the UIs. Plus adding the project name is not particularly useful since when you look at the CI pages, it's visible what repo they are in.

webknjaz commented 3 years ago

@abhinavsingh here are fixes for the pre-commit.ci env:

abhinavsingh commented 3 years ago

@webknjaz Locally I ran into PyLint socket module related errors. Amazingly, same check passes on GHA (and I guess even for you locally). So am I missing something locally here. There are variety of suggestions on StackOverflow. Any hints here?

webknjaz commented 3 years ago

@abhinavsingh this is interesting... are you on macOS? Locally, I run Gentoo Linux and GHA and pre-commit.ci both use Ubuntu, I think. But yeah, pylint is known for behaving inconsistently sometimes, according to the env it runs under (including the Python version because some of its checks tend to be semi-dynamic as opposed to flake8 which is fully static). It can be a little annoying. I'll see what I can do. Worst case, we can disable the specific rule.

webknjaz commented 3 years ago

https://github.com/PyCQA/pylint/issues/4759#issuecomment-890564379 seems to suggest that Python shipped by homebrew tends to be problematic. Usually, I use pyenv (for many reasons) — it allows me not to mess with the system installation of CPython which may be patched or influence various components in my system. I specifically use the userspace installation so it all lives in my ~/.pyenv and even if I destroy it completely, it won't affect important parts of the system. I can delete and recreate it any time + it allows me to have as many interpreter versions as I like.

abhinavsingh commented 3 years ago

Yep, I am on macOS. Lemme give it a try locally with pyenv. We might need to highlight this fact somewhere for macOS contributors or may be simply relax pylint for socket classes (which kind of is covered via mypy)

webknjaz commented 3 years ago

Well, we could apply inline disables for those cases. Or maybe add a transform plugin https://pylint.pycqa.org/en/latest/how_tos/transform_plugins.html. But I'd argue that this is a band-aid and would just shadow the real issue of having a bad CPython env, and people should be encouraged to fix their dev envs (which is probably what this false-positive really reveals).

webknjaz commented 3 years ago

So here's some job/workflow name updates making them mostly fit in the UI:

https://github.com/abhinavsingh/proxy.py/pull/669

abhinavsingh commented 3 years ago

@webknjaz I checked random repositories on Github and looks like along with setup.cfg , repos are also adding a minimal setup.py

from setuptools import setup

setup()

See https://github.com/search?q=filename%3A.pyup.yml+setup.cfg&type=code for reference. I searched this to understand why pyup broke after setup.py removal. This seems the likely reason. Wdyt?

webknjaz commented 3 years ago

setuptools supports not having setup.py for years now. I've been deleting it from all my projects since then. The only real problem was that pip install -e didn't work without it, this is why many projects adopted a minimum file stub. But that's been fixed in the recent releases so it's not a concern anymore. The concern is providing a way to bypass PEP 517 by having setup.py in the project.

As for pyup, I don't know. You could probably configure it to exclude setup.py from its checks. It's a bad idea to pin the direct deps in package meta anyway.

abhinavsingh commented 3 years ago

Thank you, makes sense. I haven't kept up with changes of late.

I also observed codecov integration is broken. Looking at the GHA logs, upload coverage step fails with "no coverage file found". Though, I see coverage files being generated just fine.

Any hints what might have gone wrong here. Root directory context doesn't change, so .coverage should still be auto-discoverable.

webknjaz commented 3 years ago

Here's a doc on the config file: https://pyup.io/docs/bot/config/#specifying. Or you could just migrate to Dependabot like others did.

webknjaz commented 3 years ago

I think that the official codecov GHA looks for an XML report that pytest does not currently produce. I was going to fix it by add --cov-report=xml but haven't got to it.

abhinavsingh commented 3 years ago

Here's a doc on the config file: https://pyup.io/docs/bot/config/#specifying. Or you could just migrate to Dependabot like others did.

Will deprecate pyup for dependabot. I was just curious about what happened that it broke.

I think that the official codecov GHA looks for an XML report that pytest does not currently produce. I was going to fix it by add --cov-report=xml but haven't got to it.

Yep this probably should fix it :). Thank you

webknjaz commented 3 years ago

abhinavsingh commented 3 years ago

@webknjaz I realized that flake8 and pylint now use 100% of all available cores. This typically happens for 5-10 seconds and is noticeable due to increase in fan speed. I am wondering if there is something we can (must) do about it. Thank you.

webknjaz commented 3 years ago

Oh, this reminds me that I forgot to increase the number of processes for pylint. flake8 already has auto set by default but pylint's config just sets a single job. It's usually best to use all the cores available to have better responsiveness. This will mean faster CI job and quicker results on the dev laptops.

webknjaz commented 3 years ago

A few more forgotten bits:

abhinavsingh commented 3 years ago

Cool, I hope pylint config update will reduce the time for which CPUs are in a tight spin loop. I don't see a problem in utilizing all cores, but spinning up all available 8 cores for 10 seconds on a developer box is indeed not necessary. At least a choice must exist :). Currently I experience my MacBook fan spinning up for quite sometime after every flake8 and pylint run.

webknjaz commented 3 years ago

Actually, per comment in the config 0 is auto. Meaning it'll utilize more cores. I suppose you could run these tools manually and pass a different value to --jobs= in CLI.

webknjaz commented 3 years ago

Import loop checker:

https://github.com/abhinavsingh/proxy.py/pull/678

webknjaz commented 3 years ago

Coverage.py config:

https://github.com/abhinavsingh/proxy.py/pull/679

webknjaz commented 3 years ago

A stricter dedicated pytest config:

https://github.com/abhinavsingh/proxy.py/pull/680

webknjaz commented 3 years ago

Enhancement to what's tested in the CI vs what makes its way to the PyPI:

https://github.com/abhinavsingh/proxy.py/pull/682

webknjaz commented 3 years ago

Pytest config hotfix:

https://github.com/abhinavsingh/proxy.py/pull/683

webknjaz commented 3 years ago

I noticed that setuptools emits a barely noticeable warning about multiline package descriptions, so here's the fix:

https://github.com/abhinavsingh/proxy.py/pull/684

webknjaz commented 3 years ago

@abhinavsingh how do you feel about replacing the hardcoded package version with one inherited from Git? Including versioning for the intermediate commits. I normally use setuptools-scm for this. I thought I'd set up GHA-based publishing and it'd be quite useful to have the version non-hardcoded.

abhinavsingh commented 3 years ago

@abhinavsingh how do you feel about replacing the hardcoded package version with one inherited from Git? Including versioning for the intermediate commits. I normally use setuptools-scm for this. I thought I'd set up GHA-based publishing and it'd be quite useful to have the version non-hardcoded.

I did envision it https://github.com/abhinavsingh/proxy.py/issues/102 but never got to it. Will be a bomb to have this. I'll be happy to have even more regular releases (which are not blocked on me personally). E.g. last version was released about 6-month back :(

webknjaz commented 3 years ago

@abhinavsingh alright. What's your release flow? Right now, having https://github.com/abhinavsingh/proxy.py/pull/682 merged, you can download artifacts (wrapped as one archive with GHA) from the workflow run, unpack them and publish with twine upload. Do you normally have preparation activities besides re-hardcoding the version, like composing a changelog? If not, this would be quite easy to implement with two PRs. I must warn that for Git-based versioning to work properly, in cases when the tag is first on another branch, that branch must be merged with a proper merge commit, not squash, not rebase (otherwise, the tag won't get into the other branch — develop/master/main/whatever — and git describe won't be able to discover the proper tag).

abhinavsingh commented 3 years ago

One preparation activity is related to brew where stable version is hardcoded within .rb file. This happens because advertised brew formula for both stable and development versions points to the develop branch. One option could be to have a single brew formula. One in the master branch points to stable version and the formulae in develop branch always points to the development version.

PS: Last I checked, brew formulae are not even working correctly, something broke in last 6-months. I think brew now expects cask formulae and not a package formulae class. Of-course, this is only intended for macOS users and I am not even sure how many folks are using proxy.py via brew. Outside of brew most of the things are kept in sync during development i.e. develop branch is always ready for publish a.k.a. merge into master branch.

Typically, if all looks good locally, I follow these steps:

Make a PR from develop -> master
Use vX.Y.Z as PR description (we must be using tags of-course)
Ensure PR is merge ready (GHA workflows are passing)
Then locally on my laptop,
1. make lib-release-test
2. Install from test PyPi and do a sanity check
3. make lib-release
Merge the PR

In unfortunate circumstances, Step-4.2 may fail. A patch PR is sent into the develop branch to address the issue and process is repeated from Step-3.

Currently, no separate CHANGELOG file is shipped, in-fact README contains minimal major-version change log in the footer. vX.Y.Z PR description works as CHANGELOG which is updated manually. These PRs are easily discoverable, though currently not linked anywhere (e.g. tags/releases). README footer change log is updated to advertising major-version change reasons (e.g. architecture change, threaded, threadless etc).

For proxy.py scenario, release workflow can be triggered for PRs from develop -> master. Workflow can follow Step-4. Workflow can re-use PR title/description for tag name/description. Wdyt?

abhinavsingh commented 3 years ago

Actually, per comment in the config 0 is auto. Meaning it'll utilize more cores. I suppose you could run these tools manually and pass a different value to --jobs= in CLI.

I missed taking the screenshot at right time, otherwise User: 99% was what I saw. This is my macbook CPU usage in last 2-minutes. I ran make lib-lint 3 times within this period, indicated by peaks.

IMHO, we'll need to address this at some point. I understand utilizing all cores, but I'll call this PyLint behavior as spinning all cores. Problem is, it doesn't even finishes fast, takes minimum of ~10-seconds.

Also, have you seen how weirdly PyLint behaves if a mypy check has failed previously, PyLint kind of just hangs for 10-20 seconds before resuming.

From process manager I see PyLint goes on to spawn 100s of processes. Ideally, PyLint must use a strategy similar to proxy.py, maintain one loop per CPU and do things asynchronously within it, instead of spawning 100s of processes, all consuming 30-70% of CPU.

Another side-effect of PyLint is, we are exhausting GHA quota's more quickly. And I am not surprised looking at the way PyLint is behaving. My laptop literally becomes unresponsive. I fear running that PyLint workflow locally, haha

I looked on GitHub and found several pull-requests into repos where they have replaced PyLint with flake8 for exactly this reason (high CPU usage).

abhinavsingh commented 3 years ago

Typically, if all looks good locally, I follow these steps:

Make a PR from develop -> master

Use vX.Y.Z as PR description (we must be using tags of-course)

Ensure PR is merge ready (GHA workflows are passing)

Then locally on my laptop,

make lib-release-test

Install from test PyPi and do a sanity check

make lib-release

Merge the PR

Thinking more on it:

Sanity check in Step-4.2 is simply re-running the e2e workflows with test PyPi distribution.
(Optionally) I'll also look into how to put an Approve button on the PR before executing Step-4.3 i.e. release to PyPi. This can be used as an opportunity to do a manual sanity check. At-least unless this specific workflow matures and we gain trust in it. Later on we can short circuit the Approval step. Just throwing out ideas here, I am unsure how easy is to add "Approval" button on PR via workflows. I see GitHub prompting me to "Approve and Run" workflows for first time contributors.
Once in place, we can schedule a daily/weekly release via workflows

About version names, I am happy to even get rid of SemVer in favor of simple DateTime based versioning system. SemVer while great for conveying major-minor-patch version, but is a PITA to maintain and onus lies on the authors/developers/maintainers to keep it aligned. At-least datetime based versioning will automate everything related to versioning & tags.

Though I am unsure how popular is the datetime based versioning system in the community OR what are the side-effects of migrating to datetime based versioning system.

abhinavsingh commented 3 years ago

I think codecov.yaml is no longer being respected.

https://github.com/abhinavsingh/proxy.py/blob/develop/codecov.yml

Because we have configured codecov to allow 1% diff in coverage without breaking the CI. But looks like codecov is now always unhappy even with 0.06% decrease in coverage.

IIRC presence of the file is enough for codecov to pick this up. Or has codecov v2 action changed something. Will need to dig into it to confirm anything. Unsure why has it gone broke.

webknjaz commented 3 years ago

Re:codecov — AFAIK the action only uploads the reports but their backend reads the config separately from that via the GitHub App you have installed. I don't see the connection to my PR. Besides, that change was necessary since they are sunsetting their previous uploader soon and the migration is inevitable.

webknjaz commented 3 years ago

re:pylint — it is totally expected that it consumes more resources. By nature, it also has checks evaluating the code by dynamically importing it. This is why it's capable of revealing deeper problems.

As for burning through the GHA minutes, I doubt it's a problem. I use GHA in many places with a lot more jobs running and haven't been able to exceed the free limit. Besides, it's not a bottleneck of the GHA setup anyway: macOS and Windows workers consume the minutes with multipliers of 2x and 10x respectively. They have a far greater influence on the resource quota, especially taking into account that there's a matrix of them vs one small linting job.

FWIW I've also prepared a PR logging the execution time of each check:

https://github.com/abhinavsingh/proxy.py/pull/692

I can also offer to create a fast linting tox env that would skip pylint, if that helps. You'd get those results from the CIs anyway. Plus I haven't enabled the caching of pre-commit's envs yet, which would probably cut some seconds from the linting run.

re:mypy — the pre-commit tool runs the checks in subprocesses, one after another. I don't see how a mypy run would influence pylint. But since pylint doesn't stream any logs until it finishes its execution, I would imagine that it may subjectively feel like it's taking longer. I think that the PR above will make the actual execution time clearer. Maybe mypy itself is slower when it sees errors, I don't know. But we'll be able to see it once the PR is merged. If it indeed influences the other process run, the only thing I'd suspect would be mac's process prioritization which I don't really know much about but I think it gives more priority to the graphical subsystem and could throttle CPU-intensive processes; since both of these are subprocesses of pre-commit, maybe it sees that one subprocess consumed a lot of resources and when the next one spawns it decides to limit its CPU time. But it's all a speculation at this point and I don't have tools to verify whether it's the case.

P.S. You can pass args to tox commands if they have {posargs}. With lint, you could do tox -e lint -- mypy --all-files to just run this one check and then compare the time it takes to complete (otherwise, use pre-commit run mypy --all-files to run it without a wrapper).

abhinavsingh commented 3 years ago

Amazingly tox -e lint -- mypy --all-files finished immediately. That's surprising. I have never seen it finish so fast when running with the entire suite.

$ time tox -e lint -- mypy --all-files                                                            ─╯
lint installed: astroid==2.8.4,attrs==21.2.0,backports.entry-points-selectable==1.1.0,bcrypt==3.2.0,cffi==1.15.0,cfgv==3.3.1,cryptography==35.0.0,distlib==0.3.3,filelock==3.3.2,identify==2.3.3,iniconfig==1.1.1,isort==5.10.0,lazy-object-proxy==1.6.0,mccabe==0.6.1,nodeenv==1.6.0,packaging==21.2,paramiko==2.8.0,platformdirs==2.4.0,pluggy==1.0.0,pre-commit==2.15.0,py==1.11.0,pycparser==2.20,pylint==2.11.1,pylint-pytest==1.0.3,PyNaCl==1.4.0,pyparsing==2.4.7,pytest==6.2.5,PyYAML==6.0,six==1.16.0,toml==0.10.2,types-cryptography==3.3.8,types-enum34==1.1.1,types-ipaddress==1.0.1,types-paramiko==2.7.3,virtualenv==20.10.0,wrapt==1.13.3
lint run-test-pre: PYTHONHASHSEED='905921385'
lint run-test: commands[0] | /Users/abhinavsingh/Dev/proxy.py/.tox/lint/bin/python -m pre_commit run --show-diff-on-failure --hook-stage manual mypy --all-files
mypy.....................................................................Passed
lint run-test: commands[1] | -/Users/abhinavsingh/Dev/proxy.py/.tox/lint/bin/python -c 'cmd = "/Users/abhinavsingh/Dev/proxy.py/.tox/lint/bin/python -m pre_commit install";  scr_width = len(cmd) + 10;  sep = "=" * scr_width;  cmd_str = "    $ {cmd}";' 'print(f"\n{sep}\nTo install pre-commit hooks into the Git repo, run:\n\n{cmd_str}\n\n{sep}\n")'
______________________________________________ summary _______________________________________________
  lint: commands succeeded
  congratulations :)
tox -e lint -- mypy --all-files  0.97s user 0.25s system 98% cpu 1.242 total

abhinavsingh commented 3 years ago

flake8 timing information is similar when run in isolation

$ time tox -e lint -- flake8 --all-files                                                          ─╯
lint installed: astroid==2.8.4,attrs==21.2.0,backports.entry-points-selectable==1.1.0,bcrypt==3.2.0,cffi==1.15.0,cfgv==3.3.1,cryptography==35.0.0,distlib==0.3.3,filelock==3.3.2,identify==2.3.3,iniconfig==1.1.1,isort==5.10.0,lazy-object-proxy==1.6.0,mccabe==0.6.1,nodeenv==1.6.0,packaging==21.2,paramiko==2.8.0,platformdirs==2.4.0,pluggy==1.0.0,pre-commit==2.15.0,py==1.11.0,pycparser==2.20,pylint==2.11.1,pylint-pytest==1.0.3,PyNaCl==1.4.0,pyparsing==2.4.7,pytest==6.2.5,PyYAML==6.0,six==1.16.0,toml==0.10.2,types-cryptography==3.3.8,types-enum34==1.1.1,types-ipaddress==1.0.1,types-paramiko==2.7.3,virtualenv==20.10.0,wrapt==1.13.3
lint run-test-pre: PYTHONHASHSEED='306389043'
lint run-test: commands[0] | /Users/abhinavsingh/Dev/proxy.py/.tox/lint/bin/python -m pre_commit run --show-diff-on-failure --hook-stage manual flake8 --all-files
flake8...................................................................Passed
lint run-test: commands[1] | -/Users/abhinavsingh/Dev/proxy.py/.tox/lint/bin/python -c 'cmd = "/Users/abhinavsingh/Dev/proxy.py/.tox/lint/bin/python -m pre_commit install";  scr_width = len(cmd) + 10;  sep = "=" * scr_width;  cmd_str = "    $ {cmd}";' 'print(f"\n{sep}\nTo install pre-commit hooks into the Git repo, run:\n\n{cmd_str}\n\n{sep}\n")'
________________________________________________________ summary _________________________________________________________
  lint: commands succeeded
  congratulations :)
tox -e lint -- flake8 --all-files  18.52s user 0.45s system 99% cpu 19.022 total

abhinavsingh commented 3 years ago

pylint the worst of all even when run in isolation. Look at the 1445% CPU and it takes 32.19s for total execution.

time tox -e lint -- pylint --all-files                                                                              ─╯
lint installed: astroid==2.8.4,attrs==21.2.0,backports.entry-points-selectable==1.1.0,bcrypt==3.2.0,cffi==1.15.0,cfgv==3.3.1,cryptography==35.0.0,distlib==0.3.3,filelock==3.3.2,identify==2.3.3,iniconfig==1.1.1,isort==5.10.0,lazy-object-proxy==1.6.0,mccabe==0.6.1,nodeenv==1.6.0,packaging==21.2,paramiko==2.8.0,platformdirs==2.4.0,pluggy==1.0.0,pre-commit==2.15.0,py==1.11.0,pycparser==2.20,pylint==2.11.1,pylint-pytest==1.0.3,PyNaCl==1.4.0,pyparsing==2.4.7,pytest==6.2.5,PyYAML==6.0,six==1.16.0,toml==0.10.2,types-cryptography==3.3.8,types-enum34==1.1.1,types-ipaddress==1.0.1,types-paramiko==2.7.3,virtualenv==20.10.0,wrapt==1.13.3
lint run-test-pre: PYTHONHASHSEED='3638090282'
lint run-test: commands[0] | /Users/abhinavsingh/Dev/proxy.py/.tox/lint/bin/python -m pre_commit run --show-diff-on-failure --hook-stage manual pylint --all-files
PyLint...................................................................Passed
lint run-test: commands[1] | -/Users/abhinavsingh/Dev/proxy.py/.tox/lint/bin/python -c 'cmd = "/Users/abhinavsingh/Dev/proxy.py/.tox/lint/bin/python -m pre_commit install";  scr_width = len(cmd) + 10;  sep = "=" * scr_width;  cmd_str = "    $ {cmd}";' 'print(f"\n{sep}\nTo install pre-commit hooks into the Git repo, run:\n\n{cmd_str}\n\n{sep}\n")'
________________________________________________________ summary _________________________________________________________
  lint: commands succeeded
  congratulations :)
tox -e lint -- pylint --all-files  748.66s user 32.19s system 1445% cpu 54.020 total

abhinavsingh commented 3 years ago

Lately I am getting a lot of these coverage errors locally. Behavior is also kind of weird.

When I hit the "green icon" to execute test, it runs fine.
Next time when I hit it again, below exception is produced.
Try again, works
Try again, doesn't work
Try again, works
Doesn't work

.... this behavior is consistent

abhinavsingh commented 3 years ago

Made some changes to the workflow files. We might have to pipeline them into the check step for a green CI.

https://github.com/abhinavsingh/proxy.py/pull/699

abhinavsingh commented 3 years ago

Made some changes to the workflow files. We might have to pipeline them into the check step for a green CI.

Trigger workflows only when necessary file changes are detected #699

At some point we may even have to abstract out linter steps, so that they can run when a matching file changes. After #699 no workflow will run for a README.md only change, but we might still want to run the markdown linter.

abhinavsingh / proxy.py

Upgrade the packaging setup #642