Python-Markdown / markdown

A Python implementation of John Gruber’s Markdown with Extension support.
https://python-markdown.github.io/
BSD 3-Clause "New" or "Revised" License
3.73k stars 858 forks source link

Language Server does not find extension attribute: `Cannot access member "Meta" for type "Markdown" Member "Meta" is unknown` #1383

Open PlexSheep opened 11 months ago

PlexSheep commented 11 months ago

When using the meta extension, the Meta field of the Markdown class is not found by Pyright. Perhaps I missed something, can we import a stub file to make this error go away?

For clarity: The code works, but the language server does not find the Meta Member for type Markdown.

This might also happen with the attributes added by other extensions, but I've found it for the meta extension specifically.

Minimal example: test.py:

import markdown
MD = markdown.Markdown(extensions=['meta'])
with open("foo.md", "r") as f:
    content: str = f.read()
    html: str = MD.convert(content)
    meta = MD.Meta    # Cannot access member "Meta" for type "Markdown"    Member "Meta" is unknown
    print(meta)

foo.md:

key: 19

this is the content

executing the script:

$ python test.py           
{'key': ['19']}

Proposed Fix Add a python stub file for the Meta attribute.

waylan commented 11 months ago

I can't replicate this. In fact, notice below that I have not even called convert yet. Simply loading the extension ensures that the Meta attribute exists.

>>> md = markdown.Markdown(extensions=['meta'])
>>> md.Meta
{}
>>> markdown.__version__
'3.4.4'
facelessuser commented 11 months ago

I don't think the argument is that md.Meta doesn't exist, only that LSP (Language Server Provider) in their editor can't automatically recognize md.Meta and its type.

As Python Markdown does not provide type annotations, and any that are available are provided by 3rd parties, I'm not sure if this is an issue for Python Markdown. If we were currently providing type annotations, then maybe it could be argued we should do something, but currently, this is not a typed library, even if someone else has provided typing that you are using in conjunction with your LSP.

waylan commented 11 months ago

Now I have reread your report and you note that things are working. But apparently some tool is inspecting the code and not finding the attribute. That is correct. Many extensions add their own attributes (most of them being third-party) and there is no way that we could possible anticipate or stub every possible attribute added by every extension. So, no we will not be giving this one extension special treatment.

That said, if anyone has suggestions for a general purpose solution, I am willing to consider it.

waylan commented 11 months ago

As Python Markdown does not provide type annotations

That will no longer be true with the next release. See #1379. BTW, @facelessuser I have a question for you in my last comment there. Based on your comments here perhaps you've been ignoring that one and missed it.

facelessuser commented 11 months ago

That will no longer be true with the next release.

Well, you are documenting the API, yes, but there are no type annotations. I'm fairly certain that is what LSP uses, not function doc strings. At least that is my understanding.

BTW, @facelessuser I have a question for you in my last comment there. Based on your comments here perhaps you've been ignoring that one and missed it.

I have been following along at a very high level. I am aware of what is proposed and the improvement to documentation. but that PR was generating a lot of noise, so it is likely I dismissed the notification without looking.

waylan commented 11 months ago

Well, you are documenting the API, yes, but there are no type annotations.

Yes, we are adding type annotionations, which is my point. It will no longer be true that the library does not provide type annotations. However, I'm not really familiar with LSP or what is does or what it expects, so I don't know how this is relevant.

facelessuser commented 11 months ago

Ah, I take that back then. Like I said, I wasn't following that closely. Unless you define a Meta property upfront and type it, it will show up as a "mystery" attribute. But since this is a dynamic attribute that only appears when the extension is loaded, I'm not quite sure. While I've used type annotations in a number of libraries, I cannot argue I'm an expert in all cases.

waylan commented 11 months ago

Unless you define a Meta property upfront and type it, it will show up as a "mystery" attribute.

Ah okay, so this is the complaint then.

It recently occurred to me that we could provide some hook for extensions to use for this. For example, we could provide an attribute which stored key/value pairs where the key would be a name provided by the extension and the value would be whatever data the extension provides. Then it could be type annotated as dict[str, Any] (or whatever format we went with).

The issue of course is that many existing extensions already provide their own custom solutions and this would be a new API. Users would then need to alter their code to check the new location. Maybe we could use a custom getter and setter which would cause the extension to write to our custom location and the user to read from our custom without any changes on their end (extension or user).

Although, now that I think about it, some projects make extensive use of custom getters and setters on classes (see Django for example). They have a whole collections of potential attributes which are unknown. How do they handle this? Maybe that is the approach we should take.

facelessuser commented 11 months ago

Although, now that I think about it, some projects make extensive use of custom getters and setters on classes (see Django for example). They have a whole collections of potential attributes which are unknown. How do they handle this? Maybe that is the approach we should take.

Maybe, there are a lot of complex typing paths that I have yet to explore.

waylan commented 11 months ago

To be clear I'm talking about the __setattr__(self, key, value) and __getattr__(self, key) methods which will accept any non-existent attribute on a class instance and set/get it. Would annotating those methods be sufficient to satisfy type checkers?

PlexSheep commented 11 months ago

If it came to it, you could just generally set a type hinting for the Meta attribute as TypeOfMeta | None. This would indicate to the user that the attribute can indeed be none (if the extension is unloaded).

In this case, I think my LSP picks up on the type if I check before using with if hasattr(MD, "Meta").

waylan commented 11 months ago

If it came to it, you could just generally set a type hinting for the Meta attribute as TypeOfMeta | None. This would indicate to the user that the attribute can indeed be none (if the extension is unloaded).

But where would we do that? Not on the Markdown class. It has no knowledge of any specific extension or what attributes an extension might add. In fact, many different extensions all add attribute and we can't cover them all.

So we have two options:

  1. Do nothing.
  2. Come up with some generic solution that works for any extension with any attribute that it might add to the class.
oprypin commented 10 months ago

Apparently I bumped into this topic here:

oprypin commented 10 months ago

There are no solutions for any typechecker that would allow these attributes to be properly declared with known types, other than actually declaring them on the core object itself.

So these are the options:

  1. Declare the 3 core attributes. Very pragmatic as per my comment. People get the benefit of knowing that they're accessing the correct attributes and the attributes' types are known and the values are further type-checked. 95+% of use cases are served well. Adding third-party extensions there will be summarily rejected.

  2. Trick the typecheckers by defining __getattr__ and letting it return Any. Then all type checking will be shut down for accessing unknown attributes on Markdown and for any values obtained from them.

  3. Define a new "data bag" sub-attribute. Similar outcome to the previous one but is not really backwards compatible.

  4. The current state: Not declare any attributes. You'll get a type error consistently. There's still some benefit to this: at least you can't make a mistake when trying to access attributes that do consistently exist.

oprypin commented 10 months ago

By the way, the type checking of both pyright and mypy are not based on the markdown package, but rather the types-Markdown package developed here: https://github.com/python/typeshed/blob/18cd196109938b690a97dc1dadf1c92b391b9639/stubs/Markdown/markdown/core.pyi#L22 That means that the solution to this particular issue is entirely in the hands of that repository, not this one (unless it adds py.typed, which per the outcome of #1399 it probably won't)

So yeah, I wonder what their philosophy is in this situation, would they consider this a valid addition to Markdown:

    toc_tokens: list[TocToken]
    toc: str
    Meta: dict[str, Any]
waylan commented 10 months ago
  1. Declare the 3 core attributes. ... Adding third-party extensions there will be summarily rejected.

The problem is that this favors the built-in extensions, which I have tried to avoid. I have tried to maintain the idea that anything that a built-in extension can do a third-party extension can do. Yes, I know the built-in extensions are favored just by being built-in, but for many years now I have operated on the assumption that in the future each built-in extension could eventually be broken out into a separate third-party extension. I'm not inclined to change that now.

  1. Trick the typecheckers by defining __getattr__ and letting it return Any. Then all type checking will be shut down for accessing unknown attributes on Markdown and for any values obtained from them.

To me this seems like the least bad option of those listed. As a benefit, it provides an official API for any extension (first or third party) to make data available to the user. Obviously, option 3 would be preferred for that purpose (official API), but the backward incompatibility is a blocker... unless we combined options 2 and 3.

We could go with option 3 and add an attribute which stored anything by key as an official API. Then we could add a getter as glue between the old and new. The getter should be generic to work with any unknown third-party extensions as well. Let's call this option 5.

Regardless of whether we went with options 2, 4, or 5 (the only contenders IMO), there would be no actual type checking. I'm okay with that. However, 2 or 5 would at least give us a way to satisfy the type checkers. Option 4 would not, but maybe that doesn't matter...

By the way, the type checking of both pyright and mypy are not based on the markdown package, but rather the types-Markdown package

Good point. Technically, at this time, this issue should have been raised with @python/typeshed. And yes, it looks like it may remain there for now. @PlexSheep you should probably submit an issue there if you would like an immediate resolution.

Regardless, I am going to leave this open until a decision is made regarding what we are going to do here.

oprypin commented 10 months ago

I ran into a slightly relevant example.

Jinja has extensions that add methods to Environment, including a core extension.

env: jinja2.Environment
env.add_extension('jinja2.ext.i18n')
env.install_gettext_translations(translations)

With typeshed stubs it was "pragmatically" added and used to just work https://github.com/python/typeshed/blob/a5bc1e037fa9fb541d81de92ad27fa8543c65be4/stubs/Jinja2/jinja2/environment.pyi#L140

But the project itself is now py.typed and does not define these attributes and users receive mypy errors.

error: "Environment" has no attribute "install_gettext_translations"  [attr-defined]

-I think there is now no way to use that extension in a type-safe way https://github.com/search?q=repo%3Apallets%2Fjinja+install_gettext_translations&type=code

waylan commented 10 months ago

I think there is now no way to use that extension in a type-safe way

I have no objection to that. In other words, that would not stop me from using the extension. All I ask is that if we define types, we have a way to verify those annotations. However, all "missing" annotations should be ignored. If there is no way to do that. then that is an issue with the tooling.

Of course, I assume that the problem is that the people who are motivated to build the tooling are the same people who want statically typed code so they haven't bothered to build the tool I want.

In any event, there is likely an added complexity here. In most cases, we can simply insert a comment on the "offending" line to tell the tooling to ignore that line. However, in this case, that may be difficult because the attributes don't exist. There is no line to comment. Unless we can include the comment on the line in the extension which defines the attribute. Or will every line which accesses the attribute need to be commented also? Including the users code? If so, that is very annoying.

oprypin commented 10 months ago

Or will every line which accesses the attribute need to be commented also? Including the users code?

Yes, that's what I was saying

The only other option is to drop these and ignore type errors and all type checks whenever these are referenced - this is both in this codebase and in any dependant codebases.

In this codebase: (https://github.com/Python-Markdown/markdown/pull/1402)

       self.md.toc_tokens = toc_tokens  # type: ignore[attr-defined]

At usages:

       self.toc = get_toc(md.toc_tokens)  # type: ignore[attr-defined]

Note that the comment can also be shorter - just # type: ignore

Or actually no, maybe users will be fine? They actually shouldn't unconditionally access that attr because it can be unset. They should be doing getattr anyway which is not type checked.

https://github.com/mkdocs/mkdocs/blob/0bf4963090b07e6e5a7807426ac4723f7bebf7a2/mkdocs/structure/pages.py#L270

      self.toc = get_toc(getattr(md, 'toc_tokens', []))
waylan commented 10 months ago

Or actually no, maybe users will be fine? They actually shouldn't unconditionally access that attr because it can be unset. They should be doing getattr anyway which is not type checked.

Good observation.

It seems, then, that the simplest fix would be for us to ignore (via comment) each internal line in the extension and update our docs to encourage users to use getattr. That seems reasonable to me and results in users using best practice. Of course, if a user isn't using getattr they probably aren't type checking their code so I'm not worried about them.