Type annotations in canopen

sveinse commented 1 year ago

With reference to the discussion in #339 with regards to type hints, I have started looking at the work needed for type annotating canopen. What is the overall ambitions with regards to implementation of type hints? What level of hinting and static code analysis are we aiming for? What tool? mypy, flake8, pyright/pylance ?

I'm a big proponent of type annotated python. It helps development, navigation in the editor works and it shows the expected types for calls and operations and thereby reducing bugs and misunderstandings. I think adding typing to canopen is the right thing to do. The benefits outweighs the disadvantages.

Having a full compliant type annotated library will reduce the degrees of freedom in the codebase. You can do more "tricks" with python than a linter is able to understand, and thus coding for it requires a degree of code conformity. canopen relies on a lot of duck typing and dynamic types (especially variables that conditionally contain None). This results a quite a bit of warnings and/or code constructs to avoid the warnings. - And rightly so, as the code in many cases is not executing any type checks to verify the datatype before invoking an operation. Lastly, many of these linting tools are very opinionated on code structure, so we must expect some functional changes.

I've given a stab at implementing type annotation in canopen (see link below). I'm using mypy and pyright. After doing this I think we will have to face a few discussions on the correct and intended behavior on a few details. E.g. the use None as optional value.

https://github.com/sveinse/canopen-asyncio/commit/4e8925e1be30fe6c3101e4f050579dbed989ecbb

acolomb commented 1 year ago

I'm fine with and definitely welcome type hinting, especially if it means that structured type info lands in the documentation as well.

Regarding None as "unspecified" or "not supported" etc., I think that is a very valid and concise expression. Even Python itself uses it as such, e.g. by returning None from a function not returning anything explicitly. As for argument default values, I don't care much whether for example an empty string "" or None is used, as long as the intention is clear. That means probably an Optional type hint in most cases, but I think reading def frobnicate(foo: str, bar: str = None) is very clear and adding Optional to that only makes it longer without real benefit. IDE code parsers do usually understand that a keyword argument with default value can be omitted. In this case, def frobnicate(foo: str, bar: str = "") is slightly less obvious, but also fine with me. Any code relying on None and "" being different things here is ill-designed anyway.

In your fork, why do you have the from __future__ import annotations thing? Isn't that a Python2 only thing, which we're not aiming for in new code anymore?

sveinse commented 1 year ago

None (or any other falsey expressions) is a valid way to declare optional values, and it is used quite often in canopen. However, dealing with multiple or optional types, also requires the code to handle all variants of the values in all places.

Let me give an example: class NmtBase doesn't initalize self.network and set it to None. The proper value of self.network is injected later from e.g. RemoteNode.associate_network(). Functionally this is fine. Calling NmtBase.send_command() with self.network is None will crash the program. But the type checker complains about this construct since calling self.network.something() when self.network is None is invalid. The principal question here becomes: Is this how we want it? If it is None let it crash? Let the type checker complain. Or is the type checker pointing out a valid weakness in the code?

class NmtBase(object):
    def __init__(self, node_id: int):
        self.id = node_id
        self.network: Optional[Network] = None

    def send_command(self, code: int):
        super(NmtMaster, self).send_command(code)
        logger.info(
            "Sending NMT command 0x%X to node %d", code, self.id)
        assert self.network  # <-- This is needed to instruct the type checker to ignore None
        self.network.send_message(0, [code, self.id])   # <-- Can't do this without checking self.network

Using from __future__ import annotation gives postponed evaluation in type annotations. E.g.

class NmtSlave(NmtBase):

    def __init__(self, node_id: int, local_node: LocalNode):  # With __future__ import annotations

    def __init__(self, node_id: int, local_node: "LocalNode"):  # Without

It is optional from py3.7 and is not yet made mandatory. The feature is described in PEP 563. See https://docs.python.org/3/library/__future__.html

christiansandberg commented 1 year ago

Go for it! Static analysis will definitely find some issues, but hopefully the fixes are fairly easy like asserting that attributes are not None like in your example.

sveinse commented 1 year ago

What is reasonable accept criterias for type annotations in canopen? What is the definition of done? What tools should we aim compliance for (and with what settings)?

Based on what I usually do I usually use mypy and pyright (the latter because I'm using VSCode) for type checking and pylint and/or flake8 for general linting. As pointed out in #346 , pylint is a commonly used tool.

sveinse commented 1 year ago

I've been working a lot with type annotating canopen lately as can be found in PR #360 . It has taken a lot more time than I expected, but I hope it can be of use. canopen uses a lot of dynamic types in variables and it was much work to figure that out. It is mostly done, but there is a few remaining errors to deal with. Please review the PR and give comments. Large or small.

I've collected a series of comments and questions. I'm sorry for the length of this post, but it would help a lot if we could get some insight into this.

1) Use of multiple types

This is a general type annotation advice: When using vars that can hold multiple types, e.g. int, str or None, it is important to always expect all variants of the variable in the code. The type checker has (rightfully so) problems understanding which type might be passed and complains a lot if assumptions are made.

There is a lot of this in canopen, and this is what's what have taken the most time to resolve. Its also harder to code with, because one have to test everything all the time. I'm sure there are lots of ´Optional` uses in canopen that can be removed, but its not always clear if the optional value serves a function or merely a convenience or habit.

2) Type annotating OD variable access is tedious

Due to the dynamic nature of (canopen OD) objects (variable, array, record) and .raw that might return all sorts of object types, it make it inherently difficult to do properly type annoation of sdo requests.

    # This creates erros in type checkers
    var: int = sdo[0x6061].raw

    # This is what's needed to do full type checking
    obj = sdo[0x6061]     # obj might be Variable, Array, Record
    assert isinstance(obj, Variable)    # Limit type to Variable
    data = obj.raw    # data might be bool, int, float, str, bytes
    assert isinstance(obj, int)    # Limit type to int
    var = data   # Now its guaranteed that var is int

Is the current design too flexible? It is impossible for the type checker to know that OD 0x6061 is a Variable (it might not be if the object dictionary is misconfigured), nor can the type checker understand that sdo[0x6061].raw return an int. What about having explicit typed methods? I know its not everyone's cup of tea, but its explicit and it can be type checked. It can be checked run-time, so in case the OD is wrong, it will clearly give error. I suppose the fundamental question is: What is expected to happen if the parameter isn't Variable, or that the type returned by .raw is not int?

Some quick proposals as food for thought:

    sdo[0x6061]  # Only returns `Variable`. Fails if used on an `Array` or `Record`
    sdo.get_container(0x1234)   # Returns `Array` or `Record`. Fails on `Variable`

    sdo.get_variable(0x6061)  # Named methods for getting variable only
    sdo.get_array(0x1234)

    var.raw_int   # Only returns int. Fails if any other types are returned

3) Needed to check variable type when encoding objects

The encoding of a parameter is stored in objectdictionary.Variable.data_type with the encoding is done in Variable.encode_raw. However, the input type of the value this funcion is very broad. I have added a runtime type check and it will raise TypeError if there is mismatch between data_type and the expected type.

4) Attribute injection

In objectdictionary.import_eds() there is a rather nasty attribute injection hack that inserts __edsFileInfo into the od object for storage. This is considered very smelly and I think we need to find a better way to save these data.

5) Is a node_id value optional or not?

Are there any use cases where initalization of LocalNode or RemoteNode is not made with a node_id value? It is declared as Optional that allows None, but a lot of logic in these classes crash if node_id is None.

6) How should `class Bits` work?

How does the variable.Bits class work with the corresponding objectdictionary.Variable.decode_bits() and .encode_bits()? I fail to understand the logic behind the item get/set and the way the slicing works (if at all). Nor are there any examples of this in use, so any help would be appreciated.

7) Random comments

pdo.base.PdoBase.__getitem__() is a bit inconsistent. For some key values it returns Map type for others Variable. PdoBase is a Mapping containing Map, so __getitem__() deviates the expected type.
pdo.base.Variable.get_data() and .set_data() doesn't handle float datatypes at all
Shall float values be used with objectdictionary.Variable.decode_phys() and .encode_phys()? Should decode_phys() return int or float?
.phys is intended to give a numerical value. However the current implementation will pass through any other types as well, leaving the type very broad. Could this be changed such that it will fail unless its an numerical type?
objectdictionary.eds._convert_variable() doesn't deal with BOOLEAN. Is that intentional?

friederschueler commented 5 months ago

We should break this up in multiple smaller issues and not forget about it. At least we fixed (5) with #445

sveinse commented 5 months ago

It's not forgotten per se, but more recent activity supersedes much of this. It's a "inbetween"-task for me in PR #360, and it needs an update.

acolomb commented 5 months ago

Sorry for not considering this more thoroughly in the past year. Other things needed my attention more urgently, but I think we're on a good path to ramp up type annotations in canopen gradually. Some questions from your list above have been answered as part of different discussions, and frankly I had forgot about them here. If still needed, let me go back and summarize my stance (and current project practice) for each point raised. Which of these do you think still need clarification, @sveinse?

sveinse commented 5 months ago

The hard part is not to type annotate our library: It's deciding how to deal with the type inconsistencies it uncovers. Particularly surrounding implicit use of None.

So, how do we approach type annotation in canopen in general? Adding them via PRs is straight forward. Can we agree on which type checker to use, e.g. mypy? If we do, should we implement a github action step for it? Is typing errors considered PR-blocking errors, or is it more "nice to have"? How do we validate type annotations?

It is extremely easy to get carried away in a deep, deep rabbit hole when starting down the type annotation path. The type annotation language has limitations and python is far more flexible. I know we will hit a few of them in canopen. When these "inflexibilities" are encountered, its easy that the code gets littered with what I've dubbed "type checker aids" (e.g. asserts and or type ignores) and not functional code. TL;DR: It's very important that we agree on how deep or complete we're willing to go with type checking.

erlend-aasland commented 5 months ago

FWIW, here are my thoughts:

So, how do we approach type annotation in canopen in general?

This is a broad question, but regarding how to add them, I would definitely suggest using gradual typing over several PRs, and definitely don't fall into the trap of mixing bugfixes (discovered by type annotations) and adding annotations.

Can we agree on which type checker to use, e.g. mypy?

+1 for mypy

If we do, should we implement a github action step for it?

Yes.

Is typing errors considered PR-blocking errors, or is it more "nice to have"?

Make it a required check and fix things right away. Using mypys gradual typing feature makes this very convenient. You don't need to enable strict mode until typing is complete.

How do we validate type annotations?

Can you expand on this?

[...] When these "inflexibilities" are encountered, its easy that the code gets littered with what I've dubbed "type checker aids" (e.g. asserts and or type ignores) and not functional code.

IMO, it's ok to add asserts temporarily, but then coupled with an issue and/or a comment that explains how to solve the particular issue properly.

An approach I've found usable is:

establish a lenient mypy CI check (gradual typing)
for each class (or other suitable code "chunk"):
1. make sure code coverage is acceptable
2. try adding type annotations to the chunk in question, but don't commit them; if bugs are found, fix the bugs with separate PRs, including regressions tests, excluding annotations
3. add type annotations
when the code base is completely typed up, switch mypy from gradual typing to strict mode

Of course, each iteration of step 2) involves possibly a series of PRs.

sveinse commented 5 months ago

So, how do we approach type annotation in canopen in general?

This is a broad question, but regarding how to add them, I would definitely suggest using gradual typing over several PRs, and definitely don't fall into the trap of mixing bugfixes (discovered by type annotations) and adding annotations.

Agreed.

Can we agree on which type checker to use, e.g. mypy?

+1 for mypy

+1 from me as well.

Is typing errors considered PR-blocking errors, or is it more "nice to have"?

Make it a required check and fix things right away. Using mypys gradual typing feature makes this very convenient. You don't need to enable strict mode until typing is complete.

I doubt we will ever get to strict mode in canopen. It deals with flexible data types and the current design is very hard to describe with type hints. E.g. ODVariable.encode_raw() accepts a value. It's behavior is not determined by the argument type (isinstance()), but by another variable (self.data_type). No current type checker is able to infer the correct type from this type logic and will create errors.

How do we validate type annotations?

Can you expand on this?

If we chose mypy, then that's the answer.

erlend-aasland commented 5 months ago

I doubt we will ever get to strict mode in canopen. It deals with flexible data types and the current design is very hard to describe with type hints.

One possible solution for such problems is to introduce new APIs and deprecate the old ones. Another is to make breaking behavioural changes. Yet another is to let the status quo win and never enable strict typing.

acolomb commented 5 months ago

So, how do we approach type annotation in canopen in general? Adding them via PRs is straight forward.

Yes, agreed that adding vie PR in chunks is review-friendly.

Can we agree on which type checker to use, e.g. mypy?

Also +1 for mypy from my side. Therefore we have consensus and mypy it shall be.

If we do, should we implement a github action step for it?

I don't care much about this point. Sure it's nice to see, but IMO type checkers are most valuable during coding.

Is typing errors considered PR-blocking errors, or is it more "nice to have"?

Given the nature of this API, I would definitely not try to enforce it. Python is not a type-safe language, and this library embraces that fact by design. Which is a good thing IMHO, as it makes the code easy to use and mirrors the design of CANopen protocols themselves. Making sure the passed / returned objects are of a certain type should be done outside the library, by the application code that knows which objects it tries to access. Any wrapper functions for specific types should be added on top of the current generic API. It should be really easy and not much additional code. But it would enlarge our library API surface quite a bit, which I don't think is worthwhile just because of type checking.

When these "inflexibilities" are encountered, its easy that the code gets littered with what I've dubbed "type checker aids" (e.g. asserts and or type ignores) and not functional code. TL;DR: It's very important that we agree on how deep or complete we're willing to go with type checking.

I'd avoid those "type checker aids" usually, rather treating the type hints as a helpful, best-effort option. But that needs to be decided on a case-by-case basis.

erlend-aasland commented 5 months ago

ISTM a gradual typing CI check would be a good compromise; you'd type up the parts of the code base that makes sense, and a lenient mypy check should be able to help you keeping those functional without being too much of a nuisance. I believe a CI check is important, as type annotations are susceptible to bit rot, like any other code snippet. Without CI checks, you risk that they become incorrect, which can end up being a problem for users of this library.

acolomb commented 5 months ago

Accepted, since mypy seems to explicitly support such a use case. Go ahead and PR the check configuration if you like. I've never really touched all the CI and GitHub related parts.

sveinse commented 5 months ago

What I'd like to do next is to add an absolute minimal mypy configuration to canopen which I'm working on now. mypy have excellent guides on how to approach existing projects: https://mypy.readthedocs.io/en/stable/existing_code.html#existing-code

mypy throws ~107 errors right off the bat with a minimal configuration. If we were to introduce this into a GH action as a blocker, we will have to address all of those issues first.

If we chose not to make them blocking in GH actions, I think there is no point in running them in GH at all! Who inspects the actions logs unless they fail?

acolomb commented 5 months ago

The only reason I see is if it supports pointing out type errors only related to the current PR, so that we can be sure a PR doesn't introduce new type check errors. Don't know enough about the "gradual typing" feature in mypy though to tell if it's possible.

sveinse commented 5 months ago

I don't know if its possible to setup partial mypy checking of PRs, like codecov is doing for code coverage. I know commercial tools, such as sonarcloud is doing that. To be able to do so we'd need two sets of configurations: One for the whole project (which needs to be lenient in the beginning) and one stricter for a PR. I don't know if that's possible with actions or mypy.

acolomb commented 5 months ago

Well, if that gets hard to get right, let's skip on the CI integration for now. But if @erlend-aasland sees value in it, I'm not opposed to adding the CI action (without enforcing the check).

erlend-aasland commented 5 months ago

All of this is possible using GH and mypy. I can look in to it in a couple of days, unless @sveinse beats me to it.

sveinse commented 5 months ago

@erlend-aasland I'm curious if you have a setup for running incremental mypy and with stricter rules than the overall project.

I've found the mypy settings that gives no errors. We can use this and remove error exception one by one while fixing the code base. Not sure if that's the best approach to do this in steps thou.

[tool.mypy]
files = "canopen"
strict = "False"
ignore_missing_imports = "True"
disable_error_code = [
    "return-value",
    "var-annotated",
    "attr-defined",
    "union-attr",
    "annotation-unchecked",
    "operator",
    "arg-type",
    "assignment",
    "override",
    "index",
    "return",
    "has-type",
    "no-redef",
    "misc",
]

acolomb commented 5 months ago

What good is a checker tool if all its warnings are silenced? I'd rather not silence them, but accept the fact that it shows a red light as long as there is something to fix. Remember, the goal is not the green light, but a working code-base with a decent amount of support for the users, so they can spot errors before the application hits them at runtime.

sveinse commented 5 months ago

A silenced tool have no value, but it works as a strategy to enable incrementally option by option if we think the PR for getting all 101 errors fixed in one go is a bit too much.

acolomb commented 5 months ago

But I want to see the errors while fixing them. Sorry, but silencing warnings is the opposite of right in my experience. Keep them visible, work on fixing bit by bit. It helps nobody to see a green light when it was reached by looking away.

Overall, I think this is getting too focused on the tools rather than the workpiece. Let's spend time (which isn't abundant) discussing what needs fixing instead of how to configure the tools.

sveinse commented 5 months ago

I was under the assumption that we'd want them plugged into the CI/CD and we're starting from scratch not using the tools. We need to either put the bar so low that everything passes and tighten the knot PR by PR, or we need to do a rather large PR to get the level sufficiently up to level for what we consider the acceptable minimum.

I repeat my opinion that there is no use for running mypy in CI/CD unless it fails the build. Otherwise it'll never get used.

...if we don't agree on that ambition, perhaps we should begin there.

acolomb commented 5 months ago

I'm not pushing for mypy to be included in CI runs. But if we do, then it should show its true result, not just pass because we basically told it to lie to us. What's the benefit of having it if any possible warnings are hidden anyway? Newly introduced errors will not be noticed, and it gives a false sense of compliance seeing these checks pass although nothing is actually fixed yet. There will be no sign of progress either, because we start off with a green light.

I do use mypy locally already. But find it's okay to see some warnings there, while focusing on different work. Sure they should be fixed at some point, but not distract from ongoing work and the fact that we do have a working code-base even without perfect typing. Remember it's just helpful pointers to avoid pitfalls caused by wrong assumptions.

Summing up, we should enforce CI checks once we have a state that passes without any silencing options. The road to that point either has them enabled but not enforced (accepting the red flag), or only running mypy locally. The former has the advantage that the analysis results are easy to look at without checking out the affected branch and running locally. If someone goes through that effort, they might as well fix what they see, since the IDE will already be open.

sveinse commented 5 months ago

OOI what settings do you use where you only get warnings from mypy?

acolomb commented 5 months ago

Sorry, I should have been more precise. I didn't mean to differentiate between errors and warnings. To me, both categories are just "warnings", because the code works nonetheless. So whenever I speak about warnings in type checking, I also mean errors.

I don't use any special mypy configuration. Only what we have in pyproject.toml, which does not have anything mypy-related AFAICT. Which is fine as far as I'm concerned, the default checks should be what we target. Maybe ignore_missing_imports for some specific packages would make sense, but I can live with the message here as well when in fact I don't have the required type information available.

sveinse commented 4 months ago

@erlend-aasland do you have any experience with using mypy with GH actions? I know its straight forward to have it run as the action job and that it must pass without errors to pass the CI/CD flow. But do you know of any system where the errors/warnings are fed back to the contributor?

erlend-aasland commented 4 months ago

I use mypy with GitLab at $WORK and we've been using several mypy checks with GHA over at CPython[^1].

I know its straight forward to have it run as the action job and that it must pass without errors to pass the CI/CD flow.

Note that you can configure your GitHub project so it's not a required check. It does not need to pass without errors in order for the CI to be green.

But do you know of any system where the errors/warnings are fed back to the contributor?

IMO, examining the CI logs is sufficient, but I'm not sure if I understood your question correctly.

[^1]: feel free to head over to the CPython repo and take a look

sveinse commented 4 months ago

canopen type annotation support is very rudimentary. My opinion is that type annotation is equally valuable to a project as unit testing is: It helps determine the expected behavior and uncovers errors. Type annotation is not just "sticking some annotations on things", it imposes a stricter, perhaps more conscious, way of coding. Examples from this projects are typically the use of default value of None. Type annotations checks will tell you that you need to check the value everywhere before using the actual value. See #513

We need to agree on what level of annotation support we aim for. E.g. mypy strict support is a very different ballgame than e.g. using mypy for development guidance only and allowing errors to prevail. For reference: our current master gives 106 mypy errors in standard rules mode and 630 errors in strict mode. Furthermore, I strongly believe that type annotations and functional changes go hand in hand and not separately, much like how functional and unit testing goes together. E.g. PR #513

I've been spending a lot of time of my spare time on improving the type annotation support in canopen. I've tried a few different approaches through various PRs already, but none of them seems to resonate. They are too big. While I agree that we need to go through the discussions this unlocks, I'm really afraid that going very granular is going to halt progress to a very slow pace. Going for typing annotation is a big job. Yet, it'll take a long time making a bread from individual grains. So perhaps full type annotation support is not within our ambition?

I've come to a point where I don't know how to proceed and there is certainly no point in wasting your time on PRs that won't get accepted. I'm sorry, I'm strongly considering to cease my type annotation efforts. Since we want to go very granular on the PRs I'm genuinely afraid it'll take too much efforts than I'm able to offer up on spare time.

erlend-aasland commented 4 months ago

Yet, it'll take a long time making a bread from individual grains. So perhaps full type annotation support is not within our ambition?

I OTOH think it is a reasonable goal. With a code base of this size, I'd expect it to take between 5 and 8 months given the number of currently active contributors.

acolomb commented 4 months ago

Thanks @sveinse for your efforts so far. I think they are definitely valuable and not wasted. The fact that so far it hasn't resonated much is from my perspective a misunderstanding / disagreement on how to approach the thing overall. Yes, #456 by @friederschueler was put down pretty quickly, unfortunately! Thanks as well for that effort, it was just too hard to review. We can still use it to see how @friederschueler had solved some problems, without re-inventing the wheel while doing smaller steps.

And for my part, I haven't had enough time to actually look at the recent suggestions. Too much going on in this project with better tests, fixes, etc. that I just deferred further typing efforts a little. But I do intend to get back to them of course. Yes, it will take a lot more time to get typing up to an acceptable level in this library.

Here's what next steps I had in mind:

Add a GHA check (but not required!) that gives us the relevant log output from mypy to actually have some common error and warning messages to discuss over.
We SHOULD NOT add any elaborate mypy configuration to the project, just leave it running at default settings in the beginning. Maybe some settings just so it finds the required modules easily. If we further discover that some things cannot be solved in the code, any needed silencing settings will come in later as part of a PR, after discussion about the other (im)possible ways to solve the issue.
This means ACCEPTING that a red light will be flashing for the foreseeable future in our GHA check. Which is the same that anyone gets by running mypy locally. And it's okay, it just shows publicly how much work is still needed to fix the existing type annotations / code.
My personal first step PR would be to add a -> None to all class constructors which have some member annotations inside. That will get rid of the warnings about non-annotated functions not being checked. And it will instead cause a lot more errors to be reported. Which is not only OK, but a Good Thing, because it raises visibility of what needs to be done.
I'm not sure it makes sense to completely decouple added annotations from actual code fixes. Sure that's a nice approach if your project is only to "add typing information". Whether you first fix the code or first add all annotations doesn't matter much - it always results in a situation where one part has to be essentially completed before meaningfully working on the other. Even if not all parts are committed / upstreamed yet.
Wherever someone starts working on this topic, try to hit the "Create Pull Request" button before getting carried away. I know it's tempting to get rid of "all the warnings" before proposing the work for inclusion. But it's not necessary and it results in endlessly hard to review mega-patches. So please don't try to reach perfection too early.

My vision of this process is to really embrace these facts: Right now we have about 5 or 10 percent "OK". The rest is missing (not annotated) or buggy (needs code adaptations). The more we annotate, the more there will be to fix. But that doesn't mean that any annotation must also immediately result in fixing all newly introduced warnings. They can very well be left piling up for some later PR or even someone else to fix them. But this way we will make progress, in small steps. There is a lot of space between our current state and whatever 100 percent we aim for. Let's reach it incrementally.

That means in essence, I don't want anything silenced, but see the stuff we are talking about. I want discussions about concrete improvements, whether they just add annotations or also add fixes. But not too big a chunk, rather stop at some point early and let that be reviewed and merged. See #451 which went through with little discussion and didn't make anything perfect, but improved a small part in a comprehensible, non-controversial way. That's roughly the size I'd be comfortable reviewing regularly.

Does that sound like a plan? Who's up for bullet number 1 to add the ugly red light?

P.S. All of these changes should be of very low risk, as we will mostly add extra safety checks. As long as the code itself keeps working, just without these added precautions, we're not worse off than right now where stuff will simply explode at runtime when stepping on a type-confusion mine.

P.P.S. And I don't think we will target strict mode anytime soon, if ever. That would be a goal for a different API, not this one with its heritage and exploiting dynamic typing for ease of use as we currently do.

sveinse commented 4 months ago

Add a GHA check (but not required!) that gives us the relevant log output from mypy to actually have some common error and warning messages to discuss over.

I humbly disagree. I think its a better approach to get the project up to a minimum level and begin tightening the reigns from there. PR #512 is doing precisely that, where it starts at a mypy default minimum. The objective: Get to the green, like we do for any pytest checks.

How do we easily detect when there are introduced new typing errors? With a test that normally pass, we'd see it immediately. If it's in a list of 100 other errors, it'll drown.

Who's up for bullet number 1 to add the ugly red light?

Just running through it with "not required" doesn't create a ugly red light, does it? AFAIK the only way to get a red light is to fail the job and a red light will block any merge, doesn't it? Maybe it possible to run it in a separate GHA workflow and that the branch protection rules are updated NOT to require the job successful? Is this possible @erlend-aasland ?

We SHOULD NOT add any elaborate mypy configuration to the project, just leave it running at default settings in the beginning.

Yes, this is the intent of #512.

My personal first step PR would be to add a -> None to all class constructors which have some member annotations inside. That will get rid of the warnings about non-annotated functions not being checked. And it will instead cause a lot more errors to be reported. Which is not only OK, but a Good Thing, because it raises visibility of what needs to be done.

mypy have an excellent guide on how to migrate existing projects from default settings towards stricter https://mypy.readthedocs.io/en/stable/existing_code.html. I'm following the approach it is proposing with the start in #512. IMHO going for untyped -> None's are not way I'd start thou. It'll generate a massive amount of errors. The basic project should be passable first before we tackle that obstacle. ...but again it boils down to the decision if its ok to commit changes that introduces mypy errors and how to detect them.

I'm not sure it makes sense to completely decouple added annotations from actual code fixes. Sure that's a nice approach if your project is only to "add typing information". Whether you first fix the code or first add all annotations doesn't matter much - it always results in a situation where one part has to be essentially completed before meaningfully working on the other. Even if not all parts are committed / upstreamed yet.

I can see that it would be valuable to do "case" based improvements. E.g. the Network fix as proposed in PR #513. This is where annotations and functional improvements goes hand in glove.

Wherever someone starts working on this topic, try to hit the "Create Pull Request" button before getting carried away. I know it's tempting to get rid of "all the warnings" before proposing the work for inclusion. But it's not necessary and it results in endlessly hard to review mega-patches. So please don't try to reach perfection too early.

Well, I have some experience with the road ahead and I erroneously thought that #512 was a slow and gentle start. Because in the volume of changes to come it is small. Yet, as said, this PR is too broad and needs to be split up into pieces. #512 is definitely not a perfect fix. It's a start point.

My vision of this process is to really embrace these facts: Right now we have about 5 or 10 percent "OK". The rest is missing (not annotated) or buggy (needs code adaptations). The more we annotate, the more there will be to fix. But that doesn't mean that any annotation must also immediately result in fixing all newly introduced warnings. They can very well be left piling up for some later PR or even someone else to fix them. But this way we will make progress, in small steps. There is a lot of space between our current state and whatever 100 percent we aim for. Let's reach it incrementally.

Agreed. But I hope, not too slow.

P.S. All of these changes should be of very low risk, as we will mostly add extra safety checks. As long as the code itself keeps working, just without these added precautions, we're not worse off than right now where stuff will simply explode at runtime when stepping on a type-confusion mine.

This is why it is so great that the test coverage and unittests are being improved in parallel! Yet, at some point we have to break an egg to make the omelet. If we are to fix and improve the library, we must be able to look forwards, not just curate the existing. And by this I don't mean that we should disregard stability, but we need somewhere to allow ourselves to more future looking that history glazing. My 2 cents.

P.P.S. And I don't think we will target strict mode anytime soon, if ever. That would be a goal for a different API, not this one with its heritage and exploiting dynamic typing for ease of use as we currently do.

I agree. It'd be a stretch to get to strict. We'd have to accept far more functional changes that what we've deliberated so far.

I propose we do the following:

Reduce PR #512 into a annotation change-only PR. No functional changes.
Setup PR #512 as a master todo-list of other PRs for the necessary functional changes as proposed here: https://github.com/christiansandberg/canopen/pull/512#issuecomment-2220384828
The acceptance criteria for merging PR #512 is green across the board for mypy with basic, default settings

sveinse commented 4 months ago

We are several people that are actively working on it and that's really awesome. It's cool when working on a project with may contributors 💪

acolomb commented 4 months ago

Yes, we agree about the disagreement :-)

How do we easily detect when there are introduced new typing errors? With a test that normally pass, we'd see it immediately. If it's in a list of 100 other errors, it'll drown.

I see a logic error in your analysis. If the test normally passes because it is muted, then it will NOT raise a flag on new errors. They will simply be muted as well. The warnings are there to be seen and fixed, not to be hidden. Better to concentrate on getting that list of 100 errors shorter. But we need a list for starters.

Just running through it with "not required" doesn't create a ugly red light, does it?

What I mean should look like this: There are required and non-required checks. An optional one does get a red cross, but doesn't fail the CI run completely. (Source: https://github.com/syncthing/syncthing/pull/9175)

...but again it boils down to the decision if its ok to commit changes that introduces mypy errors and how to detect them.

My stance on that is very clear: It is OK (right now) because it is the status quo (many typing failures) but we do release working code nevertheless. We don't need to raise that bar from many failures to zero in one single step. Let there be errors, as they still do not affect the code functionality. It's just a linter.

As a reference, have you ever checked out what the Debian GNU/Linux distribution does with their lintian tool? The package tracking database shows warnings for many packages. That's good, because it shows people what improvements need to be worked on.

Regarding your proposal #512, sorry I simply haven't looked at it yet. Need some time which I didn't find today. That doesn't mean it's not considered at all. But it is more than my outlined first step, i.e. it goes too far yet. I just think this discussion is the bottom line of defining how we want to proceed, therefore I spent my time here instead of looking at the PR. To get the basics squared and agreed upon.

Let's try the American approach in this case: move fast, break stuff. Whoever first finds an opportunity to dive into GitHub checks configuration, please proceed with adding an optional mypy check, as outlined above. If it gets too annoying or problematic, we can always just back out of it again. In the meantime / additionally, everyone is free to run mypy locally as they see fit, with or without any silencing options.

sveinse commented 4 months ago

How do we easily detect when there are introduced new typing errors? With a test that normally pass, we'd see it immediately. If it's in a list of 100 other errors, it'll drown.

I see a logic error in your analysis. If the test normally passes because it is muted, then it will NOT raise a flag on new errors. They will simply be muted as well. The warnings are there to be seen and fixed, not to be hidden. Better to concentrate on getting that list of 100 errors shorter. But we need a list for starters.

No worries, I won't hold it against you that you haven't looked at the PR yet 👍. FTR the PR doesn't mute anything. Nor does it suppress any errors, it is using the default settings of mypy. And the propsed PR (albeit too functional-y) fixes all of the open issue mypy address. When that PR is done we will be compliant and passing the mypy default. THEN we can start tightening the rope by enabling new settings that draws us towards strict(er) compliance.

Just running through it with "not required" doesn't create a ugly red light, does it?

What I mean should look like this: There are required and non-required checks.

Ah nice. Perfect.

...but again it boils down to the decision if its ok to commit changes that introduces mypy errors and how to detect them.

My stance on that is very clear: It is OK (right now) because it is the status quo (many typing failures) but we do release working code nevertheless. We don't need to raise that bar from many failures to zero in one single step. Let there be errors, as they still do not affect the code functionality. It's just a linter.

Type annotation is not just a linter, it's a methodology. It's a more restrictive application of python. Conceptually it's the same as applying black to python. Black adds opinions to how things shall be formatted. Type annotations does the same.

Regarding your proposal #512, sorry I simply haven't looked at it yet. Need some time which I didn't find today. That doesn't mean it's not considered at all. But it is more than my outlined first step, i.e. it goes too far yet. I just think this discussion is the bottom line of defining how we want to proceed, therefore I spent my time here instead of looking at the PR. To get the basics squared and agreed upon.

PR #512 contains two things: A set of annotations and a set of functional changes. From the discussions I agree it should be divided into smaller groups of changes. However, I stand by the goal and wanted end-result of implemented the essence of PR #512: Getting mypy passing into the green with default settings.

Let's try the American approach in this case: move fast, break stuff. Whoever first finds an opportunity to dive into GitHub checks configuration, please proceed with adding an optional mypy check, as outlined above. If it gets too annoying or problematic, we can always just back out of it again. In the meantime / additionally, everyone is free to run mypy locally as they see fit, with or without any silencing options.

Yes, we need the GHA in place here anyways.

acolomb commented 4 months ago

Type annotation is not just a linter, it's a methodology. It's a more restrictive application of python. Conceptually it's the same as applying black to python. Black adds opinions to how things shall be formatted. Type annotations does the same.

Precisely, it concerns only the how, but not the what does the code do. That's what I meant.

However, I stand by the goal and wanted end-result of implemented the essence of PR #512: Getting mypy passing into the green with default settings.

Of course, I don't think the end result was ever disputed. Just the size of the first step(s).

What I want is to see the progress made, i.e. run mypy in GHA so everyone looks at the same output. After that is public, any improvement can be easily measured against that baseline.

erlend-aasland commented 4 months ago

FTR, the project owners must configure GHA checks as not required in the project settings; I cannot do this with a PR. I can help with other CI related stuff, though.

Regarding pace; I don't see any reason to rush things. We'll get there eventually.

I'll be focusing on test coverage for now; IMO, that's the most important task.

sveinse commented 4 months ago

I'd like to make make a progress-tracking issue like #514 for type annotations and I started outlining one. However, I stopped because I think we do not agree on its definition of done[^1] so stating the issue objective and its tasks is not yet possible. More so, if there is little interest and/or capacity to consider type annotation improvement in parallel with the test improvements, its no point doing this work now.

[^1]: The first goal should to my opinion be the state which PR #512 takes us to, but we might follow a different path to achieve it.

erlend-aasland commented 4 months ago

We SHOULD NOT add any elaborate mypy configuration to the project, just leave it running at default settings in the beginning.

As a minimum, you will need to configure the minimum supported Python version for your annotations, so something like this:

python_version = 3.8

sveinse commented 4 months ago

Good catch. This is a comment that should be posted in #512. Luckily adding this to the mypy config didn't change anything. Everything in that PR is compatible with 3.8. 😓

friederschueler commented 4 months ago

I closed #456 because it was a mess with too many changes as we agreed. I am currently very busy in my dayjob, but I plan to commit some single PRs from this work.

erlend-aasland commented 4 months ago

I'd like to revisit one option, discussed previously in this thread, namely how to deal with all the optional members etc.

Instead of a blanket change of sprinkling the code with (potentially) performance degrading behavioural changes (if: ... raise), there is a simpler variant:

Use asserts. I'd say this is not more of a code smell than the blanket approach currently being suggested.

Pro:

it does not change behaviour
non-intrusive
smaller diffs, easier to review
it's easy to git grep for these asserts to remove them later

Con: code smell (but so is the currently proposed alternative ⚖️)

Using if... raise OTOH, is a lot worse: Con:

behavioural changes, harder to review
larger diffs, harder to review
harder to git grep
code smell

Pro: ?

I urge you to reconsider this, as I believe using asserts will make for smaller and more focues PRs, resulting in faster pace and less review friction.

Since everyone agrees asserts is a code smell, it should be an easy task to address them separately in focused PRs at a later point in time.

What do you think?

christiansandberg / canopen