Python-Markdown / markdown

A Python implementation of John Gruber’s Markdown with Extension support.
https://python-markdown.github.io/
BSD 3-Clause "New" or "Revised" License
3.81k stars 861 forks source link

API pages and inventory #1220

Closed pawamoy closed 1 year ago

pawamoy commented 2 years ago

Hello! Thanks a lot for Python Markdown, we'd do nothing without it :slightly_smiling_face:

Would you be interested in providing API pages in your documentation, as well as a consumable objects inventory? I'm willing to send a PR if that's something you'd like. Personally, I'd love to get automatic cross-refs to your markdown objects in my own API documentation pages.

The idea is to use the mkdocstrings plugin (disclaimer: I'm the author) to inject some code reference into your pages. When using mkdocstrings, every object that is rendered is added to an inventory file, objects.inv, and that file is added to the final site. Users can then load that inventory when rendering their docs, so that annotations like Markdown will automatically link to the Markdown class in your API documentation.

I tried it locally, and since it seems you're using pure Markdown docstring, the result is not so bad:

markdown_apidocs

If you find this useful and would like to proceed, I'll need some information like: what is considered public and therefore should be rendered? Is it just the objects that can be imported from markdown directly? Or more objects in deeper submodules?

If you find no interest in this, or don't want to proceed for other reasons (I see for example that you already provide a manually written reference page), that's totally fine, I won't push for it :slightly_smiling_face:

waylan commented 2 years ago

I am not opposed to having an API document. It would likely be helpful to extension developers. However, it should not replace the existing user documentation. I am a firm believer that user documentation should be write as prose, not fragmented code comments. Of course, developers also need the technical specs, which is where API docs can complement the prose (although, personally, I prefer browsing the source code). Therefore, if it is done correctly, the two can compliment each other. My thinking is that a new page (API.md?) could be added which contained this content.

In theory, any function/method/class which is not marked as private (using Python's convention of beginning the name with an underscore) is public (the one exception being double underscore methods which can be defined in a subclass). However, it is possible that a few "public" objects are not documented because the API for them was never stabilized. Others were never documented because they would only be needed for use cases which we have not documented (such as overriding the build_parser method in a subclass of the markdown.Markdown class to define a parser for a markup language other than Markdown). It is also possible that a few objects are not marked as private, but should be. I'd need to consider each on a case-by-case basis as you find them.

Finally, a minor nitpick. In the screenshot to provided, I can see the Markdown class is documented. But rather than documenting the class, you instead are documenting the __init__ method. Shouldn't that be done the way the Python documentation does it (see an example here)? Something like class markdown.Markdown(**kwargs) rather than __init__(self, **kwargs). Notice that the full import path of the class should be defined as well as how a class instance is created. Although, I suppose that might require rewriting/reformatting the code comments. Specifically, in this case the comments might need to be moved from the __init__ method to the class (or maybe the tool can combine them?). I bring this up now because if you want to do this, I will only accept it if it is done right. I want to make sure you understand what you are undertaking here.

So, yes, if you are willing to do the work, I am willing to review it and accept it when we get something great.

pawamoy commented 2 years ago

Thanks for your detailed answer :slightly_smiling_face:

I'd need to consider each on a case-by-case basis as you find them.

I could then start with a basic, recursive generation of the docs for each module (just as shown in the screenshot above), and you can tell me which member should be excluded.

Shouldn't that be done the way the Python documentation does it (see an example here)? [...] Although, I suppose that might require rewriting/reformatting the code comments. Specifically, in this case the comments might need to be moved from the init method to the class (or maybe the tool can combine them?).

Indeed, this has been in the backlog for quite some time. I'll have to think more about it. There are multiple ways to achieve that and I'd like to find the most elegant one.

Notice that the full import path of the class should be defined as well as how a class instance is created.

This should already be possible :+1: I'll send a new screenshot.

I bring this up now because if you want to do this, I will only accept it if it is done right. I want to make sure you understand what you are undertaking here.

Sure, I understand :slightly_smiling_face:

This is valuable input, thanks! I'll keep you updated!

P.S.: do you want to keep using pure Markdown docstrings, or could we consider other styles? I personally like the Google-style, and the underlying library, Griffe, is able to parse it so that mkdocstrings can output tables or lists. It's also what allows automatic cross-references to documented objects of the standard library or other third-party libs. Numpydoc and Sphinx-styles are also supported, though they are more based on rST than Markdown so I generally avoid them.

waylan commented 2 years ago

I could then start with a basic, recursive generation of the docs for each module (just as shown in the screenshot above), and you can tell me which member should be excluded.

That seems like a reasonable approach.

P.S.: do you want to keep using pure Markdown docstrings, or could we consider other styles?

Could you provide links to and/or examples of what these look like and what they are used for? I'm assuming tables or lists would be of parameters accepted by a function/method, but that's not clear to me from your description.

pawamoy commented 2 years ago

Of course. Here are some examples of Google-style docstrings in Napoleon's docs: https://sphinxcontrib-napoleon.readthedocs.io/en/latest/example_google.html

I'm assuming tables or lists would be of parameters accepted by a function/method, but that's not clear to me from your description.

You guessed correctly.

def say_hello_to(name: str = "world") -> str:
    """Say hello to the given person.

    Parameters:
        name: The person to say hello to.

    Returns:
        The hello message.
    """
    return f"Hello {name}!"

In the above example, the Parameters block will be parsed by the Google-parser as a parameters section, with the name parameter as single item. Using the function signature, the parser will also be able to know that name is a str, and that its default value is "world". These data can then be rendered using different templates, for example as a table or as a list. The Returns block will be parsed as a returns section, and the parser will pick up the return annotation from the signature as well. Many more blocks/sections are supported: attributes (for modules or classes), raises (raised exceptions), yields, receives, warns, examples, etc. More info here: https://mkdocstrings.github.io/griffe/docstrings/

I think the best way to see it in action it to look at Griffe's own code reference (Griffe is the underlying library used by mkdocstrings' Python handler to extract data from Python source and to parse docstrings), for example: https://mkdocstrings.github.io/griffe/reference/griffe/docstrings/utils/#griffe.docstrings.utils.warning You'll see that types are *automatically** linked to their respective docs, be it internal or external objects (standard lib or third-party libs).

*as long as you load their respective objects.inv inventories (Sphinx-compatible)

pawamoy commented 2 years ago

I've implemented a way to merge __init__ methods docstrings and signatures into their class'. It's done at the rendering level.

Before:

markdown_before

After:

markdown_after

pawamoy commented 2 years ago

Some screenshots with a docstring written according to the Google-style:

def __init__(self, **kwargs):
    """
    Creates a new Markdown instance.

    Other parameters: Keyword arguments:
        extensions (list): A list of extensions.
            If an item is an instance of a subclass of `markdown.extension.Extension`, the instance will be used
            as-is. If an item is of type string, first an entry point will be loaded. If that fails, the string is
            assumed to use Python dot notation (`path.to.module:ClassName`) to load a markdown.Extension subclass. If
            no class is specified, then a `makeExtension` function is called within the specified module.
        extension_configs (dict): Configuration settings for extensions.
        output_format (str): Format of output. Supported formats are:

            * "xhtml": Outputs XHTML style tags. Default.
            * "html": Outputs HTML style tags.
        tab_length (int): Length of tabs in the source. Default: 4
    """

Table style:

markdown_google_1

Maybe less disruptive - list style:

markdown_google_2

List style with cross-refs to the standard lib (we load stdlib's objects.inv):

markdown_google_3

Note that the type can usually be fetched from the signature, but here it's a keyword args section so it's not possible (we cannot annotate items of a dict distinctively).

waylan commented 2 years ago

I've implemented a way to merge __init__ methods docstrings and signatures into their class'.

πŸ‘ I like it.

Maybe less disruptive - list style:

Yeah, I agree. Let's stick with that for now.

List style with cross-refs to the standard lib (we load stdlib's objects.inv):

It could be very valuable for linking to our own internal types. However, I'm not sure we need this for the stdlib. Any stndlib types that we use are all the typical common ones which should be familiar to anyone familiar with Python. Looking at Griffe's documentation, there is no distinction between an internal link and an external link to the stdlib. I find it surprising that some of these links take me off-site while others to not. Perhaps if the stdlib links had a styling hook (such as a class on the <a> tag), they could be styled differently to make it clear they are external links. If we can't do that, then I would be inclined to exclude the links to the stdlib types.

Your screenshots are getting cut off, so I can't see this, but in the body of the documentation to the extensions parameter, a reference is made to markdown.extension.Extension. Is that getting linked to the documentation for that class? If not, what would be involved in linking it? Yes, the parameter accepts a list, but what is more important here are the types of objects contained within the list and that is what we actually need to link to here. At the same time, I'm concerned that adding a link might make the plain text docstring in the source ugly/harder to read. I suppose it depends on how its done.

Finally, I am not opposed to using the "Google style" of defining parameters, etc. While it is not strictly Markdown, it is pretty close and appears to integrate well.

pawamoy commented 2 years ago

Any stndlib types that we use are all the typical common ones which should be familiar to anyone familiar with Python. Looking at Griffe's documentation, there is no distinction between an internal link and an external link to the stdlib. I find it surprising that some of these links take me off-site while others to not. Perhaps if the stdlib links had a styling hook (such as a class on the tag), they could be styled differently to make it clear they are external links. If we can't do that, then I would be inclined to exclude the links to the stdlib types.

Very valuable input once again, thanks a lot :slightly_smiling_face: I really like the idea. I believe it's easy enough to add a visual indication that a link brings us on an external site. I'll come back with screenshots once I have something :smile: I also agree that auto-linking to builtin types is not really useful. However there's no way yet to ignore parts of a loaded inventory. I'll think about it.

Your screenshots are getting cut off, so I can't see this, but in the body of the documentation to the extensions parameter, a reference is made to markdown.extension.Extension. Is that getting linked to the documentation for that class? If not, what would be involved in linking it?

It does not link as it's written, markdown.extension.Extension, but it's quite easy to do, thanks to mkdocstrings cross-refs ability, with something like [`Extension`][markdown.extension.Extension]. It sure makes it a bit more difficult to read the plain docstring.

Yes, the parameter accepts a list, but what is more important here are the types of objects contained within the list and that is what we actually need to link to here.

We can in fact get rid of the semi-manual [`Extension`][markdown.extension.Extension] markup in the description if we actually better document the type itself. I first went with a simple list to show the auto-cross-refs ability, but we can actually either use annotations in the signature, or in the docstring itself:

Docstring-only version:

    """
    Other parameters: Keyword arguments:
        extensions (list[Extension]): A list of extensions.
    """

Note that Extension must be available in the method's scope to be correctly resolved. If Extension is instead available as extension.Extension or markdown.extension.Extension, we have to use the same path in the docstring: list[extension.Extension] or list[markdown.extension.Extension]. Each element in that path will point to the corresponding documented object: markdown will point to the markdown module docs, extension to the markdown.extension module docs, and Extension to the markdown.extension.Extension class docs.

It's equally easy with annotations, but I'll have to show another example because here we only have **kwargs.

from markdown.extension import Extension

# it would work even when type checking:
from typing import TYPE_CHECKING:
if TYPE_CHECKING:  # always false
    from markdown.extension import Extension

def some_func(extensions: list[Extension]):
    """
    Do something.

    Parameters:
        extensions: A list of extensions.
    """

Note how we don't need to specify the type in the docstring anymore: it's fetched from the signature, and work exactly the same.

Finally, I am not opposed to using the "Google style" of defining parameters, etc. While it is not strictly Markdown, it is pretty close and appears to integrate well.

Great! It's usually well supported by IDEs, and there are flake8 plugins and other linters available. mkdocstrings itself is able to log warnings (fits well with MkDocs strict option) when discrepancies are found.

waylan commented 2 years ago

I also agree that auto-linking to builtin types is not really useful. However there's no way yet to ignore parts of a loaded inventory.

Ah, okay. I guess I assumed we could. But if not, then so long as we can differentiate between internal and external, lets link to them all.

It does not link as it's written, markdown.extension.Extension, but it's quite easy to do, thanks to mkdocstrings cross-refs ability, with something like [`Extension`][markdown.extension.Extension]. It sure makes it a bit more difficult to read the plain docstring.

I can live with that. Actually, I think that is probably the simplest solution one could expect.

Note that Extension must be available in the method's scope to be correctly resolved. If Extension is instead available as extension.Extension or markdown.extension.Extension, we have to use the same path in the docstring: list[extension.Extension] or list[markdown.extension.Extension].

Note that the extensions parameter accepts a list of objects, Each object can be one of a string or a markdown.extension.Extension instance. The list may contain a mix of both. Therefore, wouldn't it be something like list[markdown.extension.Extension, str] (something to that effect)?

As far as whether we use Extension, extension.Extension, or markdown.extension.Extension, we should always use markdown.extension.Extension in documentation. It annoys me to no end when I'm reading documentation and can't determine the import path of an object I want to use. I will not do that to my users.

pawamoy commented 2 years ago

Note that the extensions parameter accepts a list of objects, Each object can be one of a string or a markdown.extension.Extension instance. The list may contain a mix of both. Therefore, wouldn't it be something like list[markdown.extension.Extension, str] (something to that effect)?

Yes, we would typically either use

from typing import List, Union

...and in our docstring:

"""
    extensions (List[Union[str, Extension]]): ...
"""
from __future__ import annotations

...and in our docstring:

"""
    extensions (list[str | Extension]): ...
"""

As far as whether we use Extension, extension.Extension, or markdown.extension.Extension, we should always use markdown.extension.Extension in documentation. It annoys me to no end when I'm reading documentation and can't determine the import path of an object I want to use. I will not do that to my users.

Currently, annotations are rendered as written in the source, but when the annotation is not a full path, you just have to hover on it to see the full path. You can try it here for example: https://mkdocstrings.github.io/griffe/reference/griffe/agents/extensions/#griffe.agents.extensions.base.Extensions.after_children_inspection (try to hover on InspectorExtension). Do you think this is acceptable? If not, we can probably add an option to always render the full path of annotations. However note that this can quickly become a very long string, especially with type unions, tuples, etc.

waylan commented 2 years ago

""" extensions (list[str | Extension]): ... """

Cool. I wasn't aware of that. I find this syntax more readable. Let's go with it if we can.

Currently, annotations are rendered as written in the source, but when the annotation is not a full path, you just have to hover on it to see the full path.

This is why I dislike autogenerated documentation. Code should be written in a way that works best for the code and documentation should be written in the way that works best for documentation. One should never make a compromise in one for the other. Hovering helps, but I can't copy/paste that so it is still a compromise.

However note that this can quickly become a very long string, especially with type unions, tuples, etc.

That is a valid concern. I suppose one approach would be to use the short version in the annotation (Extension), but then in documentation for the class which is being linked to, we do provide the full path.

Given the above, I think is makes sense to do this when documenting a parameter:

extensions (list[str | Extension]): ...

and this when documenting the class itself:

markdown.extensions.Extension(...)
waylan commented 2 years ago

I suppose one approach would be to use the short version in the annotation (Extension),

To be clear, I am speaking about what gets rendered. I understand that the annotation in the docstring may need to match the code so it knows what is being referenced. However, I expect the rendered documentation to consistently stick to a predefined style everywhere. If a tool can't do that, then its not worth my time.

pawamoy commented 2 years ago

I agree that we need at least a way to consistently enforce the chosen style in specific parts of the docs (annotations in parameters section vs. headings), name-only or full-path, and not just rely on what was used in the source. I'll work towards that!

pawamoy commented 1 year ago

Hello, I took some time to publish a demo of what's possible: https://pawamoy.github.io/markdown/sitemap.html.

It's using latest versions of mkdocstrings, mkdocstrings-python (Insiders) and Griffe, as well as the mkdocs-section-index, mkdocs-literate-nav and mkdocs-gen-files plugins.

The mkdocs-nature theme does not seem to handle HTML in headings (<code ... appears in the browser tab title) so it would probably be best to disable the mod symbol for modules.

My changes can be seen here: https://github.com/Python-Markdown/markdown/compare/master...pawamoy:markdown:api-docs.

I only took care of fixing the documentation warnings issued by Griffe (which collects data in the source code and parses docstrings). Most of the docstrings are therefore left untouched. I documented types of parameters and return values using actual annotations (with future annotations enabled) instead of writing them in docstrings.

Many things can still be improved in the generated docs :slightly_smiling_face: Just wanted to tease a bit :blush:

waylan commented 1 year ago

This is getting there. Looks like you have addressed all of my previously mentioned concerns.

First thing I noticed when I visited your demo is weird blocks by all external links. It appears that you have an error in your CSS. I'm using the latest release of Chrome.

Untitled

And there is also some invalid HTML which needs to be tracked down. It seems that most of it is related to the mod tags, which I don't care for (personal preference). I realize they wouldn't show up everywhere in future releases (only where there is a change), but they should not be in the breadcrumb or previous/next links at all (or in the list of links in the <head> of the document). Maybe just turn off that feature and not use mod tags anywhere?

Why do we have class and func tags, but nothing different for methods? Methods need to be called from an instance of a class (in most cases) and so I would think that they should clearly be labeled as such.

Speaking of tags, I find it odd that they are all code elements. I would have expected span. Yes, they are being rendered monospace, but I would think that would be a stylistic choice in this context, not something to force on all themes. But I'm not going to make that a show stopper. Just noting my surprise.

Every module contains the same text at the top which lists authors and copyright notices etc. Is there a way to leave that in the code files but not include it in the generated documentation?

Looking at the extensions (see example here), I see that we have a list of functions but there is no list of classes at the top of the page (although the classes are documented). Other pages seem to include a list of both classes and functions. What's going on here?

I would move the API documentation to a different location in the navigation/sitemap. It should be after Test Tools. Not in the middle of Extensions.

I noticed a few places where a method accepts **kwargs, but in the list of arguments you used **other rather than **kwargs. Let's keep the list of arguments and the method signature consistent (use kwargs everywhere).

And now for some styling observations, which are always subjective...

I don't care for the lines (left border) on nested blocks. To me is appears like blockquotes, which they are not. Perhaps indentation would be fine here???

I'm also not crazy about the class/function/method signatures. Each appears as a code block immediately after the heading. I understand why they are there. It's just that the styling makes them stand out too much IMO. I realize they are just using the default code block styles for the theme, so that is not your fault. I don't have any suggestions for improvement. And it may be this is something that the theme would need to address separately. I just thought I should mention it while I'm at it.

pawamoy commented 1 year ago

Thanks for your extensive feedback @waylan, really appreciated!

weird blocks

I only ever use Firefox, so never noticed this issue on Chrome. I will try to see if there's an equivalent to mask-image in Chrome :thinking:

mod tags

Sure, we can remove those from Python-Markdown's docs, and generally turning them into spans is also a good idea. For comparison, the Material for MkDocs theme seems to strip those code tags from page titles (the actual HTML titles), so it doesn't break anything there (like tab titles, breadcrumbs, previous/next links in footer, etc.).

no method symbol

I initially simply copied the design from GitHub, where methods are also listed as "functions". That's also why I called them "symbol types". But I guess having meth or method for method would still be better. Moreover that's what I did for auto-summaries: methods are listed as "Methods", not "Functions", so it would only make sense to use the same words for symbols :+1:

module docstrings (copyright notice etc)

Somehow not showing docstrings of modules, or stripping away the copyright stuff sounds to me like a very, very specific case. I'd rather suggest to transform these info into Python comments, to keep only what's actually relevant to the module in its docstring. For example in postprocessors.py:

# Python Markdown

# A Python implementation of John Gruber's Markdown.

# Documentation: https://python-markdown.github.io/
# GitHub: https://github.com/Python-Markdown/markdown/
# PyPI: https://pypi.org/project/Markdown/

# Started by Manfred Stienstra (http://www.dwerg.net/).
# Maintained for a few years by Yuri Takhteyev (http://www.freewisdom.org).
# Currently maintained by Waylan Limberg (https://github.com/waylan),
# Dmitry Shachnev (https://github.com/mitya57) and Isaac Muse (https://github.com/facelessuser).

# Copyright 2007-2018 The Python Markdown Project (v. 1.7 and later)
# Copyright 2004, 2005, 2006 Yuri Takhteyev (v. 0.2-1.6b)
# Copyright 2004 Manfred Stienstra (the original version)

# License: BSD (see LICENSE.md for details).

"""
POST-PROCESSORS
=============================================================================

Markdown also allows post-processors, which are similar to preprocessors in
that they need to implement a "run" method. However, they are run after core
processing.

"""

we have a list of functions but there is no list of classes

Good catch, that seems like a bug. When rendering objects, we check if the object itself has a docstring, or if any of its members (recursively) has a docstring, to know if we should render it. That's why classes like TocTreeprocessor are rendered although they don't have a docstring themselves. For auto-summaries, I think we only check if the object itself has a docstring. We should apply the same logic there, and check members as well. Thanks!

move the API documentation

Sure, will do.

**kwargs / **other

I went too fast indeed, there are ways in both cases to stay in sync with the signature :+1:

left border

Will remove the borders, no problem :slightly_smiling_face:

not crazy about the class/function/method signatures

We can render the signature directly into the heading. But headings can only hold inline code, not code blocks, so long signatures in headings will not look so good. It's also not possible yet to add links (cross-refs) to names in inline signatures. That's not a big issue though because we already have those links in the rendered Parameters/Returns section.


Will apply the changes and notify you once it's done :slightly_smiling_face:

pawamoy commented 1 year ago
waylan commented 1 year ago

Looking good! πŸ‘

Somehow not showing docstrings of modules, or stripping away the copyright stuff sounds to me like a very, very specific case. I'd rather suggest to transform these info into Python comments, to keep only what's actually relevant to the module in its docstring.

That is what I expected the answer would be. We need to do that then (transform the copyright info into Python comments).

We can render the signature directly into the heading. But headings can only hold inline code, not code blocks, so long signatures in headings will not look so good. It's also not possible yet to add links (cross-refs) to names in inline signatures. That's not a big issue though because we already have those links in the rendered Parameters/Returns section.

Yeah, I don't think we want the signature directly in the heading. It can stay-as-is for now. I was simply recording all of my observations. We may address this in the theme with some styling adjustments. Which raises a question: is there a styling hook applied to signatures which we can use to apply a style only to them and not all code blocks?

Removed mod tags from page titles (kept in level 1 headings though).

Thank you, I'm fine with that. One question though: when we go live, I don't want every page to indicate it was modified, only the ones that actually were. How to we address that? I assume that for future releases this will be a non-issue. I'm only asking about the initial release.

waylan commented 1 year ago

We can render the signature directly into the heading. But headings can only hold inline code, not code blocks, so long signatures in headings will not look so good. It's also not possible yet to add links (cross-refs) to names in inline signatures. That's not a big issue though because we already have those links in the rendered Parameters/Returns section.

Yeah, I don't think we want the signature directly in the heading. It can stay-as-is for now. I was simply recording all of my observations. We may address this in the theme with some styling adjustments. Which raises a question: is there a styling hook applied to signatures which we can use to apply a style only to them and not all code blocks?

The more I think about this, the more unsure I am about it. Looking back through the old comments in this discussion with screenshots, I like the signature in the header. And we don't need the links in the signature as they are provided in the Parameters/Returns section.

There is one concern I do have though. I like the TOC for the page in its current form where only the function/class/method name is present with no arguments and/or punctuation. If we could retain the current TOC and have the signature in the header, I might prefer that. Or maybe it only works well when the entire thing fits on one line. Some of the sigs are pretty long, so that may be the deal breaker. I might need to see a side-by-side.

pawamoy commented 1 year ago

is there a styling hook applied to signatures

No, but we can rely on the fact that signatures always appear after a heading, so this CSS should select only signature code blocks:

.doc-heading + div.highlight {}

I'm also open to adding a proper CSS class to signature code blocks to make this more robust.

I like the signature in the header

I'll create a second fork and publish the same demo with signatures in headings :slightly_smiling_face: (maybe a bit brutal, please share if you know a lighter way). Also, the TOC label is set through the data-toc-label HTML attribute, and rendering the signature in the heading does not add it to the label. In short, even with signatures in headings, the TOC will still have short names :+1:

I don't want every page to indicate it was modified

If we auto-generate the pages, I don't think it's possible to prevent the latest build from updating dates or random numbers in assets/asset links, if that's what you mean. The only way I see would be to stop auto-generating the pages, and track them in Git. It means that pages would have to be updated each time a module is created, removed or renamed, but I believe that will happen very rarely here.

If I got your question wrong, do let me know!

waylan commented 1 year ago

One other thing, which is admittedly a nit-pick. I see that the various pages/modules are arranged alphabetically in the navigation. However, I would prefer them to be arranged logically. Is there a way to do that?

I might suggest this order:

waylan commented 1 year ago

I'm also open to adding a proper CSS class to signature code blocks to make this more robust.

That would be preferable.

I'll create a second fork and publish the same demo with signatures in headings πŸ™‚ (maybe a bit brutal, please share if you know a lighter way).

Or a side-by-side screenshot could suffice. Just grab the ugliest one you can find. 😁 Whichever is easier for you is fine with me.

If we auto-generate the pages, I don't think it's possible to prevent the latest build from updating dates or random numbers in assets/asset links, if that's what you mean. The only way I see would be to stop auto-generating the pages, and track them in Git. It means that pages would have to be updated each time a module is created, removed or renamed, but I believe that will happen very rarely here.

Hmm. I guess I'm not sure how you determine changes have been made. What if we turn off the feature for the first release and then turn it on for all future releases? I realize there would be no mod tags on things that actually changed in the first release, but we had nothing before, so no great loss there. Seems to me like this would be a show-stopper for every new project which adopts your tool, so I'm surprised you don't have a ready solution for it.

pawamoy commented 1 year ago

See version with signatures in headings here: https://pawamoy.github.io/markdown-sig-head/reference/markdown/

We have different ways to choose how things are ordered. The first two won't work here, but thankfully there's a third one.

  1. using the members_order option. It accepts alphabetical (used here) or source. But source makes no sense for modules, so modules are always ordered alphabetically in auto-summaries.
  2. using the members option. It's an explicit list of members we want to render. Auto-summaries respect this option too. Unfortunately, specifying modules when rendering the top-level markdown module will actually render these modules on the same page. That would create a huge top-level page and I'm pretty sure we don't want that.
  3. Griffe can parse Modules sections in docstrings, and we can select which type of auto-summary we want for each object. So for the top-level markdown module only, we could disable the module auto-summary:
::: markdown
    options:
      summary:
        modules: false  # implicit if we don't set it to true
        attributes: true
        functions: true
        classes: true

and add such a section in the module docstring:

"""
Modules:
    core: Description.
    preprocessors: Description.
    blockparser: Description.
    blockprocessors: Description.
    treeprocessors: Description.
    inlinepatterns: Description.
    postprocessors: Description.
    serializers: Description.
    util: Description.
    htmlparser: Description.
    test_tools: Description.
    extensions: Description.
"""

I'll try that and update the demo sites :slightly_smiling_face:

pawamoy commented 1 year ago

I realize there would be no mod tags

Oooh, OK, I think there's a misunderstanding. Previous things you wrote make more sense to me now. I think you read the mod tag as "modified"? It actually stands for "module", just like "func" for "function", "attr" for "attribute", etc. It's simply the "kind" of the object. All these tags can be removed by the way, thanks to the show_symbol_type_heading: true and show_symbol_type_toc: true options. Or their contents can be customized (as well as their colors), see https://mkdocstrings.github.io/python/usage/customization/#symbol-types.

pawamoy commented 1 year ago

Both sites are updated with the new, correctly ordered modules summary :slightly_smiling_face:

waylan commented 1 year ago

I think you read the mod tag as "modified"? It actually stands for "module", just like "func" for "function", "attr" for "attribute", etc. It's simply the "kind" of the object.

Ah, yes, I was reading them as "modified" not "module". 😊 You can ignore that concern.

All these tags can be removed by the way, thanks to the show_symbol_type_heading: true and show_symbol_type_toc: true options. Or their contents can be customized (as well as their colors), see https://mkdocstrings.github.io/python/usage/customization/#symbol-types.

Good to know. I'm half inclined to change the tags to full words, rather than abbreviations. πŸ€·β€β™‚οΈ

3. Griffe can parse Modules sections in docstrings, and we can select which type of auto-summary we want for each object.

Sounds good πŸ‘

See version with signatures in headings here: https://pawamoy.github.io/markdown-sig-head/reference/markdown/

Thank you for providing that. It helps but I'm still undecided.

If we go with sigs in the header we need to make a few adjustments to the styles.

  1. I would add some padding/margin between the tag and the code span.
  2. Also the code span's background should be transparent when in a header. In fact, this is supposed to be happening in the theme by default but the pygments styles are overriding it as the code spans have the codehilite class assigned to them. The codehilight class should only be used for code blocks. As these are codespans, remove that class and this is resolved.
  3. The long (wrapped) lines are the only remaining issue. If the second (wrapped) line was not outdented to the start of the tag, I thing that might address my concern. In other words, the tag would by styles much link a list bullet and no text in subsequent lines of the header would be left of the right-most edge of the tag.
pawamoy commented 1 year ago

I'm half inclined to change the tags to full words

I can get that. I cannot stand "meth" :rofl: On the other hand, "attribute" would be quite long in the TOC... Maybe we can use full symbol names for headings and no symbols at all for the TOC.

  1. That's weird, there are whitespaces in the templates, but it seems that when a Jinja filter is used (in this case to inline-highlight Python code in the heading), it removes all whitespace on its left (and probably right too). I am able to force Jinja to keep whitespace by adding &nbsp; right before the filter, or by using {%+ filter .... This will be fixed by the next mkdocstrings-python versions.
  2. Our highlight filter actually matches the MkDocs config. In this case, codehilite is enabled, so it uses it to highlight the code, and the codehilite class is therefore added. Its background effect can be cancelled through CSS though.
  3. We can reduce line length by not showing type annotations in signature. Then I'm afraid aligning continuation lines with the right end of the symbol type is not part of my skills :sweat_smile:

https://pawamoy.github.io/markdown-sig-head/reference/markdown/ updated

waylan commented 1 year ago

I can get that. I cannot stand "meth" 🀣 On the other hand, "attribute" would be quite long in the TOC... Maybe we can use full symbol names for headings and no symbols at all for the TOC.

I'm thinking we should use Module. I'm okay with either Meth or Method. The rest are fine as abbreviations, if for no other reason, they are common in code (albeit, not necessarily Python code).

  • That's weird, there are whitespaces in the templates, but it seems that when a Jinja filter is used (in this case to inline-highlight Python code in the heading), it removes all whitespace on its left (and probably right too). I am able to force Jinja to keep whitespace by adding &nbsp; right before the filter, or by using {%+ filter .... This will be fixed by the next mkdocstrings-python versions.

I expected a margin-right defined on the tag. But inline whitespace could work too.

  • Our highlight filter actually matches the MkDocs config. In this case, codehilite is enabled, so it uses it to highlight the code, and the codehilite class is therefore added. Its background effect can be cancelled through CSS though.

I maintain the codehilite extension and there never has been and never will be support for highlighting code spans, only code blocks. Therefore, there is no reason to ever apply the codehilite class to a code span. I think that the class should be removed.

Yes, this means that the rendered HTML for separate sigs is different from sigs included in the headers. But I would consider anything other than that behavior a bug.

  • We can reduce line length by not showing type annotations in signature.

This is definitely cleaner. But I think most people would expect type annotations to be included in API documentation. I'm inclined to leave them in. Besides, not every docstring currently lists its parameters. Without the type annotations, we have too much missing info.

Then I'm afraid aligning continuation lines with the right end of the symbol type is not part of my skills πŸ˜…

Understood. I can always tackle this later.

By the way, I haven't seen the reordered modules in the demo but it appears that you did make the necessary change. Am I missing something?

pawamoy commented 1 year ago

never will be support for highlighting code spans

Good to know, thanks, we will update our highlight filter accordingly :+1:

I haven't seen the reordered modules

Ah, right, I didn't think of the navigation. The changes you mention updated the Modules bullet list on this page: https://pawamoy.github.io/markdown/reference/markdown/.

I'll also update the script that auto-generates pages to order the modules in the navigation the same way :+1:

pawamoy commented 1 year ago

Updated https://pawamoy.github.io/markdown-sig-head/sitemap.html If we go with this version (signatures in headings), I'll push it to my "markdown" fork and delete the "markdown-sig-head" one.

waylan commented 1 year ago

Looks good. πŸ‘ Let's go with the sigs in headings. Feel free to merge that in.

Another nit-pick: consider this permalink: reference/markdown/inlinepatterns/#markdown.inlinepatterns.InlineProcessor.handleMatch. The part markdown/inlinepatterns/#markdown.inlinepatterns is repetitive. Seems like reference/markdown/inlinepatterns/#InlineProcessor.handleMatch would make more sense. Any way we can not include the full import path in the hash part of the link, but leave it in the body text?

By the way, there are a few places where I have noticed a collapsible details tags in the text (see this example and source). It appears that those are coming from indented blocks. I would have expected code blocks, not details. We could reformat those to fenced blocks, but we shouldn't have to IMO.

In browsing through the demo, I do see various places where we will need to do some editing of the text in the code comments, but that can come later.

pawamoy commented 1 year ago

Any way we can not include the full import path in the hash part of the link, but leave it in the body text?

The issue is that we rely on mkdocs-autorefs for the cross-references feature. autorefs stores a mapping of anchors and the pages they appear in. If we were to shorten #a.b.c.d to #c.d (appearing in /a/b), we then wouldn't be able to cross-reference a.b.c.d using this id. We would have to use c.d instead, and you can imagine how that messes everything up.

Generally speaking, object anchors shouldn't depend on the page they're rendered into. They could very well be rendered into the top-level index page for example, or in a subpage whose URL has nothing to do with the module hierarchy.

Since mkdocstrings does all the docs injection from a Markdown extension, it doesn't obviously have access to MkDocs pages data anyway (it's possible but not straight-forward).

collapsible details

The example you mention is clearly a bug. The lines are parsed as an admonition section (which is rendered with an HTML details tag). They shouldn't. I'll look into it.

waylan commented 1 year ago

Generally speaking, object anchors shouldn't depend on the page they're rendered into.

I was afraid you would say something like that. I don't like it, but I can deal with it.

The example you mention is clearly a bug.

I figured. I may have seen one or two others but don't recall where now.

I only ever use Firefox,

The thought I would mention that I have checked in Chrome, Edge and Safari and they are all working correctly now.

I think you can submit a PR with this whenever you are ready. Any additional edits to the documentation text can be done separately. Hopefully the bug can be resolved before we are ready for release (target is October after Python 3.12 is released). Although, I am curious to see what protestations we get from our CI server. For example, I'm guessing the spellchecker for the site will find a number of issues. Don't worry, I'm not expecting a perfect run on the first go-round. We can work through it.

pawamoy commented 1 year ago

Updated demo, I've fixed the docstring parser to be more strict.

% griffe dump -Ldebug -o/dev/null -fdgoogle markdown 2>&1 | grep reasons
DEBUG      markdown/test_tools.py:91: Possible admonition skipped, reasons: Extraneous blank line below admonition title
DEBUG      markdown/inlinepatterns.py:26: Possible admonition skipped, reasons: Missing blank line above admonition; Extraneous blank line below admonition title
DEBUG      markdown/extensions/smarty.py:16: Possible admonition skipped, reasons: Extraneous blank line below admonition title

These logs show that Griffe correctly identified these parts of the docstrings as regular markup and not admonitions.

Will send a PR shortly :slightly_smiling_face:

waylan commented 1 year ago

I was a little confused as to how those blocks of text could be confused as admonitions as they look nothing like Markdown admonitions. So I went looking at your documentation to figure out how your tool works. I see this page, which outlines three different styles for docstrings. But it is no clear to me which you are using. Neither is is clear how admonitions relate to any of that.

I guess my point is that I have realized that I have no idea what the requirements are for formatting our docstrings. It would be necessary for this to be documented in our contributing guide. Whether that involves a link pointing to external documentation or internal documentation would depend on what's involved and how clear the external documentation might be.

Also, I always expected that we would use our own Markdown parser for parsing Markdown formatted docstrings. Maybe I assumed this is what was happening, but now I am not so sure.

pawamoy commented 1 year ago

Fair concerns. I'll try to answer them.

Earlier in the conversation, I asked if you would be OK with using the Google-style: this is the one we're using here. Basically, the Google-style offers a particular syntax for documenting different aspects of Python objects:

Obviously all this can be documented using plain Markdown. What the Google-style offers is a standardized syntax to document it. The benefit is that IDEs and tools can parse it as structured data to render it nicely for users, or for linting purposes (checking that all parameters of a function are documented for example).

The syntax is the one described in the page you linked:

Section Title:
    Indented section contents.

Supported section titles are Attributes, Parameters, Raises or Exceptions, Warnings, etc.

That's as simple as that. The content of the section is not parsed again: it's simply picked up as it is, and in the case of mkdocstrings forwarded to Markdown.convert to transform it to HTML. So yes, we do use our own Markdown parser, not something custom :slightly_smiling_face: That is why we can use all the features Python-Markdown and its extensions offer, from within docstrings.

Now, most sections support documenting multiple different items, so it adds a bit to the syntax:

Section Title:
    item 1: Description of item 1.
        It can span multiple lines. Continuation lines must be indented once more
        to delimitate from other items.
    item 2: Etc.

A concrete example is the Parameters (or Params, Args, Arguments) section:

Parameters:
    foo: Description of the `foo` parameter.
    bar: Description of the `bar` parameter.

The Google-style standard specifies a few more things, like being able to specify the type of a parameter:

Parameters:
    foo (str): Description of `foo` parameter.

...but this particular feature is not super interesting to us as we can rather use type annotation in the signature of the function/method, and they are picked up by the Griffe docstring parser.

Thanks to this syntax, we can conveniently structure the data to reuse it later.

parameters = [
    Parameter(name="foo", type="str", description="Description of `foo` parameter."),
    ...
]

Same goes every other known sections (Attributes, Class, Functions, Methods, Modules, Deprecated, Other parameters, Warns, Raises, Yields, Receives, Returns, Examples).

Now, for the sake of consistency, the Griffe parser will still pick up sections (title + indented contents) that it knows nothing about, and consider them to be admonitions. This is useful to keep the same style in a docstring while supporting very common admonitions like Note, Warning, etc.:

Parameters:
    thing: Description.

Note:
    Something relatively important to know.

And it allows the admonition title to contain spaces, so this is a valid admonition:

See also:
    Some other related stuff.

...which leads us to your original concern. In the previous versions of Griffe, the parser did not require a blank line before a section or admonition (now it does), and allowed blank lines after the title (now it does not), so a paragraph like this:

...last few lines of a paragraph that
tells about something:

    some indented code block
    showing nice things

...would be parsed as:

...which was obviously wrong. The latest version of Griffe is more strict about this. As stated above, it now requires a blank line before a section/admonition, and no blank line after the title, so the two following paragraphs are correctly parsed as regular markup:

...last few lines of a paragraph that
tells about something:
    some indented code block
    showing nice things
...last few lines of a paragraph that

tell about something:

    some indented code block
    showing nice things

However, the second block in next one would be parsed as an admonition again.

...last few lines of a paragraph that

tells about something:
    some indented block
    showing nice things
    that will not be rendered as a code block

I think it caters to how we're used to write Markdown.


I didn't go to such lengths to document this in Griffe, because I'm so used to Google-style docstrings (even before I had to write a parser for them) that I forget it is not immediately intuitive for everyone else. I'm completely open to improving the docs there to explain better how it works :slightly_smiling_face:

A final note - the Numpydoc-style is kind of the same thing, just with a different syntax:

Parameters
----------
foo
    Description of `foo`.
bar
    Description of `bar`.

Note
----
Something noteworthy here.
waylan commented 1 year ago

Thank you. That summary is very helpful. In fact, if that summary (or similar) had been in the documentation, I might have better understood. The issue is the mile-high overview. The details are documented just fine. I just wasn't understanding how the pieces all fit together.

One thing I will note is that I find it strange that admonitions are being parsed at that stage (pre-markdown). I would expect an option to disable them and let Markdown handle that. I'm concerned that your current solution is too fragile and we could get more false-positives in the future. I'm fine going forward now, but would suggest that a future update to the tool provide a better solution. Also, where is the admonition feature documented? I never found it mentioned anywhere.

pawamoy commented 1 year ago

Oops, that's not documented indeed :sweat_smile:

The thing is that even if we add an option to disable Google-style admonitions, we still have to parse Google-style sections (which use the same syntax), so the same fragile algorithm will apply anyway.

Disabling (with an option) Google-style admonitions would also mean that users have to write their admonitions following the syntax of Markdown extensions (for example markdown.admonition or markdown-callouts. I find it more pleasant to have a consistent style:

Parameters:
    hello: Hello.

Note:
    Something.

...rather than:

Parameters:
    hello: Hello.

!!! note
    Something.

...or:

Parameters:
    hello: Hello.

NOTE: Something.

...but that's just my opinion of course, and I'm fine with whatever you decide for Python-Markdown's docs :slightly_smiling_face:

Also, an alternative solution would be to write yet another Markdown extension that parses:

Note:
    Something.

...into a note admonition. Why not :shrug:

waylan commented 1 year ago

we still have to parse Google-style sections (which use the same syntax), so the same fragile algorithm will apply anyway.

Perhaps due to the lack of documentation I missed this detail. The thing is, if I want an admonition, I want it to match the style of admonitions we already use elsewhere in our documentation. As your admonitions do not use the same HTML, they look nothing like the rest. Therefore, it would be preferred that they be disabled.

I would expect the special words: Attributes, Parameters, Raises, Exceptions, Warnings, etc. to trigger the necessary sections but any random words to not trigger anything. It would seem that your admonitions can be triggered with any random text. That is the specific feature I would prefer to disable.

pawamoy commented 1 year ago

I see, no problem, I can add such an option to Griffe :slightly_smiling_face:

There's yet another alternative solution: https://gist.github.com/pawamoy/a12cb4a5f66d913519070b52b2a9d54b :heart_eyes: However it's definitely not standard :rofl: And not yet supported anyway :slightly_smiling_face: But I think I'll implement support in a Griffe extension and use it in one of my own projects to see how it goes :thinking:

waylan commented 1 year ago

Cool. I wasn't aware of PEP 727. That is an interesting idea. Not sure that I like the proposed solution, but I like that they are trying to solve that specific issue.

waylan commented 1 year ago

This is closed in #1379.