acdh-oeaw / rdfproxy

Functionality for mapping SPARQL query result sets to Pydantic models
GNU General Public License v3.0
2 stars 0 forks source link

Provide a client code hook for controlling model truthiness #110

Closed lu-pl closed 2 days ago

lu-pl commented 1 month ago

Model truthiness is an important metric for the rdfproxy grouping mechanism. Currently, the logic for determining model truthiness is hard-coded to recognize a model as truthy if any of its fields is truthy, (see rdfproxy.mapper.ModelBindingsMapper._get_unique_models line 35).

This is a sane default, yet certain frontend demands require different model truth conditions.

Current behavior

With the current implementation, a simple model definition like

from fastapi import FastAPI
from pydantic import BaseModel, ConfigDict
from rdfproxy import Page, SPARQLModelAdapter

query = """
select ?parent ?child ?name
where {
    values (?parent ?child ?name) {
        ('x' 'c' 'foo')
        ('y' 'd' UNDEF)
        ('y' 'e' UNDEF)
        ('z' UNDEF UNDEF)
    }
}
"""

class Child(BaseModel):
    name: str | None = None

class Parent(BaseModel):
    model_config = ConfigDict(group_by="parent")

    parent: str
    children: list[Child]

adapter = SPARQLModelAdapter(
    target="https://query.wikidata.org/bigdata/namespace/wdq/sparql",
    query=query,
    model=Parent,
)

app = FastAPI()

@app.get("/")
def base_route(page: int = 1, size: int = 100) -> Page[Parent]:
    return adapter.query(page=page, size=size)

yields the following result:

{

      "items": [
            {
                  "parent": "x",
                  "children": [
                        {
                              "name": "foo"
                        }
                  ]
            },
            {
                  "parent": "y",
                  "children": [ ]
            },
            {
                  "parent": "z",
                  "children": [ ]
            }
      ],
      "page": 1,
      "size": 100,
      "total": 3,
      "pages": 1
}

According to the currently hard-coded truth condition for model instances, a model is truthy if any of its fields is truthy; so the above configuration correctly returns empty arrays for y and z children, because for those rows, the single Child field name is None.

However, it might very well be desirable for backend implementers and API consumers to differentiate between "no object" and "an object with a single null value/only null values". Currently, this is not possible.

Solution proposal

A solution for this is to provide a hook for allowing client code to control the conditions for model instance truthiness by supporting a model_bool field in pydantic.ConfigDict.

The model_bool property would accept arguments of type

  1. Callable Client code may provide a callable of arity 1 which receives the model instance as argument at runtime.

  2. str A string value for model_bool defines the truthiness of the field denoted by that string value as general truth condition for the model.

  3. Iterable[str] An Iterable[str] value for model_bool defines the truthiness of the model as the conjunction of all fields referenced in the iterable, i.e. the model is only considered to be truthy if all the referenced fields have truthy values.

This way it would be possible to adapt the above example to allow objects with only a single null value like so:

class Child(BaseModel):
    model_config = ConfigDict(model_bool=lambda model: True)

    name: str | None = None

The expected result would then be:

{

      "items": [
            {
                  "parent": "x",
                  "children": [
                        {
                              "name": "foo"
                        }
                  ]
            },
            {
                  "parent": "y",
                "children": [
            {
            "name": null
            },
            {
            "name": null
            }
        ]
            },
            {
                  "parent": "z",
                "children": [
            {
            "name": null
            }
        ]
            }
      ],
      "page": 1,
      "size": 100,
      "total": 3,
      "pages": 1
}
lu-pl commented 1 month ago

Type for model_bool callable arguments:

class ModelBoolPredicate(Protocol):
    def __call__(self, model: _TModelInstance) -> bool: ...
lu-pl commented 1 month ago

Example for model_bool with a str argument

A string value for model_bool defines the truthiness of the field denoted by that string value as general truth condition for the model.

So e.g. for the following Child definition

class Child(BaseModel):
    model_config = ConfigDict(model_bool="child")

    name: str | None = None
    child: str | None = None

the expected result would be:

{
    "items": [
        {
            "parent": "x",
            "children": [
                {
                    "name": "foo",
            "child": "c"
                }
            ]
        },
        {
            "parent": "y",
            "children": [
                {
                    "name": null,
            "child": "d"
                },
                {
                    "name": null,
            "child": "e"
                }
            ]
        },
        {
            "parent": "z",
            "children": [ ]
        }
    ],
    "page": 1,
    "size": 100,
    "total": 3,
    "pages": 1
}

The result row ('z' UNDEF UNDEF) will produce an empty array for the children field, because the condition for Child to be true is defined in terms of the child field to be true.

I.e. also ('z' UNDEF 'bar') would return an empty array for the children field.

lu-pl commented 1 month ago

Note that it would currently NOT be possible to achieve

{
    "items": [
        {
            "parent": "x",
            "children": [
                {
                    "name": "foo",
                }
            ]
        },
        {
            "parent": "y",
            "children": [
                {
                    "name": null,
                },
                {
                    "name": null,
                }
            ]
        },
        {
            "parent": "z",
            "children": [ ]
        }
    ],
    "page": 1,
    "size": 100,
    "total": 3,
    "pages": 1
}

by excluding a model field from serialization like so:

class Child(BaseModel):
    model_config = ConfigDict(model_bool="child")

    child: str | None = Field(default=None, exclude=True)
    name: str | None = None

This is due the kludgy implementation of the currently hard-coded model truthiness logic which relies on serialization.. I consider this a bug, issue pending.

lu-pl commented 1 month ago

This is due the kludgy implementation of the currently hard-coded model truthiness logic which relies on serialization.. I consider this a bug, issue pending.

This might actually be a very easy fix, instead of calling _model.model_dump().values() in the above mentioned line 35, dict-casting the model should do the trick. That is, the serializer won't run in that case.

See #112 .