dandi / dandi-schema

Schemata for DANDI archive project
Apache License 2.0
5 stars 8 forks source link

Consider an alternative solution in marking `@container` in context generator #214

Open candleindark opened 6 months ago

candleindark commented 6 months ago

This issue originates from this post and subsequent discussion. In short, the current way of marking @container, based on stringification, works for now but imprecise. It would be better if we have a more reliable solution.

candleindark commented 6 months ago

I think we can consider attaching metadata information to a type (indirectly to a field) for context generation or some other purpose. Do it that way, what information to be attached will be explicit. With Pydantic V2, metadata can be easily attached to a type.

from typing import Type

from pydantic_core import CoreSchema
from typing_extensions import Annotated

from pydantic import BaseModel, GetCoreSchemaHandler

class Metadata(BaseModel):
    foo: str = 'metadata!'
    bar: int = 100

    @classmethod
    def __get_pydantic_core_schema__(
        cls, source_type: Type[BaseModel], handler: GetCoreSchemaHandler
    ) -> CoreSchema:
        if cls is not source_type:
            return handler(source_type)
        return super().__get_pydantic_core_schema__(source_type, handler)

class Model(BaseModel):
    state: Annotated[int, Metadata()]

m = Model.model_validate({'state': 2})
print(repr(m))
#> Model(state=2)
print(m.model_fields)
"""
{
    'state': FieldInfo(
        annotation=int,
        required=True,
        metadata=[Metadata(foo='metadata!', bar=100)],
    )
}
"""

As you can see, you can even have a Pydantic model to represent the metadata you want to attach to a type, and this metadata will not interfere with validation and serialization if you don't want it to.

You can find out more about this example at https://docs.pydantic.dev/latest/concepts/json_schema/#modifying-the-schema.

satra commented 6 months ago

ooh that's nice - we could move all the jsonld related stuff into metadata instead of having nskey for example. can metadata also be added at the model level?

candleindark commented 6 months ago

ooh that's nice - we could move all the jsonld related stuff into metadata instead of having nskey for example. can metadata also be added at the model level?

Do you mean adding metadata to a field of a Pydantic model type? If that's your question, the answer is yes. Metadata can be attached to essentially any type using the Annotated typing form. In fact, multiple pieces of metadata bundled in different objects can be attached to a type. Please take a look at the example below for some of these possible usages.

from typing import Any
from enum import Enum
import json
from pprint import pprint

from pydantic_core import CoreSchema
from typing_extensions import Annotated

from pydantic import BaseModel, GetCoreSchemaHandler

class AccessType(Enum):
    """An enumeration of access status options"""

    #: The dandiset is openly accessible
    OpenAccess = "dandi:OpenAccess"

    #: The dandiset is embargoed
    EmbargoedAccess = "dandi:EmbargoedAccess"

class Metadata1(BaseModel):
    foo: str = 'metadata!'
    bar: int = 100

    @classmethod
    def __get_pydantic_core_schema__(
        cls, source_type: Any, handler: GetCoreSchemaHandler
    ) -> CoreSchema:
        if cls is not source_type:
            return handler(source_type)
        return super().__get_pydantic_core_schema__(source_type, handler)

class Metadata2(BaseModel):
    x: int = 0
    y: int = 42

    @classmethod
    def __get_pydantic_core_schema__(
        cls, source_type: Any, handler: GetCoreSchemaHandler
    ) -> CoreSchema:
        if cls is not source_type:
            return handler(source_type)
        return super().__get_pydantic_core_schema__(source_type, handler)

class SubModel(BaseModel):
    a: int = 100
    b: str = "Hello, world!"

class Model(BaseModel):
    state: Annotated[int, Metadata1()]  # metadata on int
    access_type: Annotated[AccessType, Metadata2()]  # metadata on enum
    f: Annotated[SubModel, Metadata1(), Metadata2()]  # multiple metadata objects

json_schema = Model.model_json_schema()

print(json.dumps(json_schema, indent=2))
"""
{
  "$defs": {
    "AccessType": {
      "description": "An enumeration of access status options",
      "enum": [
        "dandi:OpenAccess",
        "dandi:EmbargoedAccess"
      ],
      "title": "AccessType",
      "type": "string"
    },
    "SubModel": {
      "properties": {
        "a": {
          "default": 100,
          "title": "A",
          "type": "integer"
        },
        "b": {
          "default": "Hello, world!",
          "title": "B",
          "type": "string"
        }
      },
      "title": "SubModel",
      "type": "object"
    }
  },
  "properties": {
    "state": {
      "title": "State",
      "type": "integer"
    },
    "access_type": {
      "$ref": "#/$defs/AccessType"
    },
    "f": {
      "$ref": "#/$defs/SubModel"
    }
  },
  "required": [
    "state",
    "access_type",
    "f"
  ],
  "title": "Model",
  "type": "object"
}
"""

m = Model(state=42, access_type=AccessType.EmbargoedAccess, f=SubModel(a=1, b="hi"))
print(m.model_dump_json(indent=2))
"""
{
  "state": 42,
  "access_type": "dandi:EmbargoedAccess",
  "f": {
    "a": 1,
    "b": "hi"
  }
}
"""

pprint(m.model_fields, indent=2)
"""
{ 'access_type': FieldInfo(annotation=AccessType, required=True, metadata=[Metadata2(x=0, y=42)]),
  'f': FieldInfo(annotation=SubModel, required=True, metadata=[Metadata1(foo='metadata!', bar=100), Metadata2(x=0, y=42)]),
  'state': FieldInfo(annotation=int, required=True, metadata=[Metadata1(foo='metadata!', bar=100)])}
"""

As you can see, the metadata can be attached to different types (or indirectly fields of different types), multiple metadata objects can be attached, and all the metadata do not affect JSON schema generation nor Pydantic model validation.

If you choose to, you can attach metadata that affect the generation of JSON schema and validation of a type. You can see examples of those at https://github.com/dandi/dandi-schema/pull/203#issuecomment-1849413394 and https://github.com/dandi/dandi-schema/blob/0c97b8eac7a601fab2903eac7c0439ec021aa2f9/dandischema/types.py#L1-L40

All in all, I think we can benefit a lot in this project from some of the new features in Pydantic V2.