BK: Add tox.ini with typing,lint (codespell); add workflows for codespell; update pyproject.toml

yarikoptic commented 1 year ago

To make template better tested and IMHO we should start aiming for type annotation of code in (new) extensions, although datalad itself is not yet annotated.

@jwodder - is it feasible to aim for downstream libraries (like datalad extensions) to be type annotated although only small portion of datalad itself is type annotated ATM? Could you help finishing up this PR (type annotations for that small datalad_helloworld)?

@mih I also quite like pre-commit with black now as used in dandi-cli and other projects. May be we could/should introduce that to extensions to encourage code/imports consistency etc?

jwodder commented 1 year ago

@yarikoptic

is it feasible to aim for downstream libraries (like datalad extensions) to be type annotated although only small portion of datalad itself is type annotated ATM?

The main problem I see is that, as long as datalad lacks a py.typed file, when mypy type-checks a library that uses datalad, any types & values from datalad will be treated as Any, limiting the amount of checking mypy can do. I'm not sure what would happen if datalad gained a py.typed file before it was completely type-annotated, but I doubt it'd be pretty.

codecov[bot] commented 1 year ago

Codecov Report

Patch coverage: 100.00% and project coverage change: +0.34% :tada:

Comparison is base (fac0fda) 92.68% compared to head (cd2a860) 93.02%. Report is 19 commits behind head on main.

:exclamation: Current head cd2a860 differs from pull request most recent head 60bfa81. Consider uploading reports for the commit 60bfa81 to get more accurate results

Additional details and impacted files

```diff @@ Coverage Diff @@ ## main #45 +/- ## ========================================== + Coverage 92.68% 93.02% +0.34% ========================================== Files 5 5 Lines 41 43 +2 ========================================== + Hits 38 40 +2 Misses 3 3 ``` | [Files Changed](https://app.codecov.io/gh/datalad/datalad-extension-template/pull/45?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=datalad) | Coverage Δ | | |---|---|---| | [datalad\_helloworld/hello\_cmd.py](https://app.codecov.io/gh/datalad/datalad-extension-template/pull/45?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=datalad#diff-ZGF0YWxhZF9oZWxsb3dvcmxkL2hlbGxvX2NtZC5weQ==) | `88.88% <100.00%> (+0.88%)` | :arrow_up: | | [datalad\_helloworld/tests/test\_register.py](https://app.codecov.io/gh/datalad/datalad-extension-template/pull/45?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=datalad#diff-ZGF0YWxhZF9oZWxsb3dvcmxkL3Rlc3RzL3Rlc3RfcmVnaXN0ZXIucHk=) | `100.00% <100.00%> (ø)` | |

:umbrella: View full report in Codecov by Sentry.

:loudspeaker: Have feedback on the report? Share it here.

yarikoptic commented 1 year ago

remove typing workflow -- could not figure out how to ignore _version.py . Left possibility to run it though via typing since IMHO we should strive to do typing.

yarikoptic commented 1 year ago

smth likely in the _datalad_buildsupport changes I added which now makes

subcommand --help doc output not building

```shell make: Entering directory '/home/runner/work/datalad-extension-template/datalad-extension-template/docs' python -m sphinx -b html -d build/doctrees -W source build/html Running Sphinx v6.1.3 usage: setup.py [global_opts] cmd1 [cmd1_opts] [cmd2 [cmd2_opts] ...] or: setup.py --help [cmd1 cmd2 ...] or: setup.py --help-commands or: setup.py cmd --help error: option --cmdsuite not recognized making output directory... done [autosummary] generating autosummary for: index.rst [autosummary] generating autosummary for: /home/runner/work/datalad-extension-template/datalad-extension-template/docs/source/generated/datalad.api.hello_cmd.rst loading intersphinx inventory from https://docs.python.org/objects.inv... intersphinx inventory has moved: https://docs.python.org/objects.inv -> https://docs.python.org/3/objects.inv building [mo]: targets for 0 po files that are out of date writing output... building [html]: targets for 1 source files that are out of date updating environment: [new config] 2 added, 0 changed, 0 removed reading sources... [ 50%] generated/datalad.api.hello_cmd reading sources... [100%] index Warning, treated as error: /home/runner/work/datalad-extension-template/datalad-extension-template/docs/source/index.rst:[24](https://github.com/datalad/datalad-extension-template/actions/runs/4040767962/jobs/6946719935#step:6:25):toctree contains reference to nonexisting document 'generated/man/datalad-hello-cmd' make: *** [Makefile:55: html] Error 2 make: Leaving directory '/home/runner/work/datalad-extension-template/datalad-extension-template/docs' Error: Process completed with exit code 2. ```

yarikoptic commented 1 year ago

ah -- filed https://github.com/datalad/datalad-extension-template/issues/51, will filter this branch to not bother about build support for now either.

matrss commented 1 year ago

remove typing workflow -- could not figure out how to ignore _version.py .

I ran into this issue as well. Just excluding _version.py is not enough, since the module is imported from other files which are type checked, which in turn means that the _version module will be checked. In addition to the exclude I needed to add an override which allows untyped calls and definitions in this module (and others, like _datalad_buildsupport.*), see: https://github.com/matrss/datalad-getexec/blob/4847a6937f81f461ec4972aff963a7849f667f8b/pyproject.toml#L43-L50

jwodder commented 1 year ago

The way I handled ignoring _version in tinuous was to add the following to the mypy config:

[mypy]
exclude = _version\.py

[mypy-tinuous._version]
follow_imports = skip

matrss commented 1 year ago

Yes, that should also work. I wasn't aware that you can specify follow_imports per module instead of just globally.

yarikoptic commented 1 year ago

ha -- that last setting did the trick seems to me! thank you @jwodder. Pushing now back with typing CI enabled.

yarikoptic commented 1 year ago

if no objections are voiced, I will merge it in 2-3 days. IMHO it would improve extensions

jsheunis commented 1 year ago

I have zero experience with typing, and minor with linting. But FWIW I agree with the sentiment expressed by @mih, especially the ambiguity of

what is a not-a-datalad-extension-without-it and what is more of a here-is-something-running-the-you-could-keep-if-liked.

I am already quite bad at keeping everything green in the extension that I maintain, and having another red cross will be annoying. I don't mean to say that this PR is not useful, rather that the version that gets merged would ideally be the least complex and least restrictive variation from the perspective of extension developers.

mslw commented 1 year ago

On one hand, I like how the extension template encourages extension maintainers to use continuous integration by providing appveyor and github actions configuration. And I like how manpage generation "just works" thanks to the boilerplate code for building docs. On the other hand I also feel that it's helpful to keep the extension template relatively bare-bones, as there is a bit of complexity to be grasped already. Perhaps linting tools are something that can be left up to extension maintainers rather than included in the template (that's not to say they are not useful - I began using black and flake8 in recently started redcap extension, although just through manual triggers python-black-buffer, not precommits or anything).

In summary, I'm on the fence about this one, so just sharing my thoughts.

P.s. I guess left between the lines of my comment and discussion above is who the "extension maintainers" or "authors" really are - I think that the extension template targets both maintainers of extensions under DataLad organization, as well as external DataLad users who want to fine-tune their workflows. In the first case, the template is more of a normative document where it makes sense to establish conventions; in the latter, it's more important to reduce complexity and make entry more accessible.

bpoldrack commented 1 year ago

In the first case, the template is more of a normative document where it makes sense to establish conventions; in the latter, it's more important to reduce complexity and make entry more accessible.

Agree. I think the problem here is to mix those two roles of this repo. And I think the template was never meant to become that normative source for the datalad-organization universe. Hence, I lean towards @mih's point of view and to not continue to make this the most expensive hello world imaginable.

Does raise the question, though: Do we have or want a place for such things, where our other repos (in fact not only extensions) could pull such workflows and their updates from?

matrss commented 1 year ago

I think that the extension template targets both maintainers of extensions under DataLad organization, as well as external DataLad users who want to fine-tune their workflows. In the first case, the template is more of a normative document where it makes sense to establish conventions; in the latter, it's more important to reduce complexity and make entry more accessible.

This might get somewhat off-topic, but since I have recently build a small extension and am not part of the DataLad Team I want to give some input from my point of view re: complexity.

There are some hurdles when setting up a new extension which go beyond just "instantiating" the template:

obviously, updating metadata
setting up readthedocs
setting up appveyor
codeclimate?

I have setup readthedocs, but skipped the last two since I have never used appveyor before and didn't yet find a good reason to investigate further. Instead I have just setup a tox test suite in a GitHub action. Regarding codeclimate, I took a look at their website and couldn't see what it would provide to me, so I didn't bother either.

With that in mind, I think it would be a good idea to make the template work out-of-the-box as much as possible, which in practice would mean to reduce external services as much as possible. For example, the docbuild workflow could publish to a GitHub Page and make readthedocs redundant. The appveyor pipeline could be converted to a GitHub action if feasible and then require no further setup. Some more automated way of updating the metadata could be nice, but there is #42 for that.

Considering this, I do not see much complexity added here. I think it is useful to have some CI workflows that check best-practices via linting and type-checking, as long as they do not introduce even more setup steps. Maybe these could be self-contained GitHub actions instead of introducing another tool (tox) though.

mih commented 1 year ago

@matrss Thank you for expressing your view. I think your comments are also point out the discrepancy between the normative and minimal approach that @mslw

yarikoptic commented 1 year ago

Thank you @matrss @mih @mslw @jsheunis ! @matrss: Indeed this PR does not introduce any need for extra steps on external services to enable any new added here QA feature, so I expect it only to robustify extensions developed with this set of "better practices", even though indeed it might be an additional red cross to address -- usually much easier if not put into the closet of "will do later". Indeed there is a balance to strike between being minimal normative "howto/what an extension is" and being a pragmatic template to achieve the best result with minimal effort. But I think for the minimal normative view - we have stepped away from it long ago, and this template became more of the latter. But as a result of current shape of it I see lots of duplication (which I hate) of effort (as e.g. @mslw mentioned - enabling flake8 etc) since good practices are desired by everyone really, but then done different ways since done separately, and that makes it inefficient, and harder to fix/contribute across extensions. So it then boils down to us deciding on some common best practices and how to code for them. IMHO type setting, guaranteed absence of typos, and consistent testing is something to strive for, and that is what this PR proposes. If we do not want any of those features -- let me know which and I remove its support in this PR.

RE @matrss

Maybe these could be self-contained GitHub actions instead of introducing another tool (tox) though.

Well, they are (lint and typing), they just use tox to streamline dependencies installation and possible invocation centralization, so could be done exactly the same way in CI and locally. Having said that, I do feel odd in me hiding codespell within tox's lint, and often I just add codespell workflow which uses codespell action. I can do it that other way too if we decide to proceed here at all.

yarikoptic commented 1 year ago

re @mih

If I need to maintain a spell checker configuration now, because it simply does not know something that I nevertheless need to write, the convenience is soured.

In my experience with codespell through the past weeks where I submitted PRs to a number of projects, in some touching over a hundred of files (see e.g https://github.com/CDLUC3/dmptool/pull/428 and https://github.com/cvmfs/cvmfs/pull/3183) I am genuinely surprised on how well codespell has performed (kudos to many contributors who developed it)! With added config file it is trivial to ignore that rare to hit "false positive" and hopefully soon inline annotation would make it more specific. To that extent -- we had codespell CI enabled in datalad core since Sun Apr 25 18:12:03 2021 , and I don't remember much complaints about it severing the convenience. But I think we did not hit much of typos recently, did we?

yarikoptic commented 1 year ago

ok, let me ask

do we want type annotations and checking for their correctness in the extensions?
do we want to tolerate typos in our code/docs?

I think knowing answers would help me since I kinda lost track on what you are fighting against here dear @datalad/developers . If not in the template -- everyone would need eventually to introduce proposed checks one way or another into their code.

datalad / datalad-extension-template

BK: Add tox.ini with typing,lint (codespell); add workflows for codespell; update pyproject.toml #45

Codecov Report