Fatal1ty / mashumaro

Fast and well tested serialization library
Apache License 2.0
751 stars 44 forks source link

Take description from docstring #222

Open Peter9192 opened 3 months ago

Peter9192 commented 3 months ago

Is your feature request related to a problem? Please describe. I would like my dataclasses to be as concise and readable as possible. This makes them easier to maintain, especially for new/inexperienced developers.

Describe the solution you'd like When you add a docstring to a class attribute, use that in the description field of the generated JSON schema. For example:

@dataclass
class SimpleRadiationConfig:
    Q0: float = 100
    """Fixed net radiation."""

print(build_json_schema(SimpleRadiationConfig).to_json())
{
    "type": "object",
    "title": "SimpleRadiationConfig",
    "properties": {
        "Q0": {
            "type": "number",
            "description": "Fixed net radiation.",
            "default": 100
        }
    },
    "additionalProperties": false
}

Describe alternatives you've considered I saw #125, which achieves the same thing, but it requires the annotating all attributes as "fields". I believe pydantic also supports this, but it requires marking all classes as pydantic.BaseModel, which feels more invasive.

Additional context

Fatal1ty commented 3 months ago

hi @Peter9192

I was thinking about it from the start but the inability to distinguish an automatically added technical docstring from an explicit one stopped me. Now I think that we can match the automatically generated docstring by its pattern. However, it’s worth adding a new builder parameter with the following possible values:

class DocStringDocumentation(StrEnum):
    FULL = "full"  # all docstrings will be used
    EXPLICIT_ONLY = "explicit_only"  # only explicitly added, will be by default
    NONE = "none"  # none of them

What do you think? You can help with the naming to speed up the work.

Edit: All this applies to the dataclass documentation but not to a certain field. If you know if pydantic adds a field documentation based on the docstring, please give me more info. All I know is there is no way to document a certain field with docstring in Python.

Peter9192 commented 3 months ago

Hi @Fatal1ty, thanks for the quick response! I wasn't aware of auto-generated docstrings. Can you clarify what you mean? I did find some docstring generators, but I believe you're referring to something else. Perhaps the options could be parse_docstrings with options all, none, and explicit_only. I think "explicit_only" is quite clear, can't think of better alternatives.

All this applies to the dataclass documentation but not to a certain field.

Just to be sure: by "dataclass documentation", do you mean only the top level docstring on the dataclass? I was hoping this would be possible also for fields, i.e. class variables with a type annotation (not those explicitly defined with the field function.

I may have been a bit too quick to conclude that pydantic supports this. However, I did find a recent PR that seems to add this functionality.

In the past I generated automatic API docs with autodoc, which led me to believe it should be possible to extract this info quite easily. However, it seems this doesn't discriminate fields from other class members, which was okay for my use case but may be too limiting for a generic implementation.

I believe mkdocstrings also parses field docstrings, see https://github.com/mkdocstrings/python/issues/58

Fatal1ty commented 2 months ago

I wasn't aware of auto-generated docstrings. Can you clarify what you mean?

Sure, here it is:

from dataclasses import dataclass

@dataclass
class SimpleRadiationConfig:
    Q0: float = 100

print(SimpleRadiationConfig.__doc__)  # SimpleRadiationConfig(Q0: float = 100)

Just to be sure: by "dataclass documentation", do you mean only the top level docstring on the dataclass?

Yes, I mean the top level docstring because it's easy to get it from __doc__ attribute.

I may have been a bit too quick to conclude that pydantic supports this. However, I did find https://github.com/pydantic/pydantic/pull/6563 that seems to add this functionality.

I see. They use ast module to parse the dataclass code. I'm not sure it's a good idea to invent a way to set field documentation in such non-standard ways that require parsing the code. I'm more inclined to use typing.Doc from PEP 727. It's not accepted so far but it's already included in typing-extensions.

from typing import Annotated, Doc

class User:
    name: Annotated[str, Doc("The user's name")]
    age: Annotated[int, Doc("The user's age")]

On the other hand, as you well noted, other tools use the comment after the field as documentation for it. It might make sense to come up with a way to connect plugins to JSON Schema generation, one of which would be to add documentation to comment-based fields.

mishamsk commented 1 month ago

just to chime in here - we use data classes for app configuration. Fields have docstrings that are then used to generate documentation and JSON schema. The latter drives a configuration UI (we are working on generating the UI automatically from the schema).

The usage of docstrings is convenient, because the text is shown in the IDE (developer-friendly), so there is no need to separately copy-paste the same or similar documentation string as the field metadata or annotation.

Yes, this means parsing the AST. In our case it is a build-time thing, so I am not at all concerned. If you'd add this to the JSON schema generator, we'll be able to use the built-in code instead of maintaining our own.

However, I can see a potential trap of mashumaro becoming bloated as undoubtedly more and more configuration will be needed to accommodate downstream usage.