acdh-oeaw / rdfproxy

GNU General Public License v3.0
0 stars 0 forks source link

Feature: Optional explicit SPARQL binding to Model allocation #56

Closed lu-pl closed 2 weeks ago

lu-pl commented 3 weeks ago

Pydantic model instantiation from SPARQL result sets connects SPARQL bindings with model fields by matching SPARQL binding names with field names.

However, there should also be an option to explicitly allocate a SPARQL binding to a model field. This e.g. allows using the same field name in a model and also in nested models of that model.

One possible interface for this would use typing.Annotated to indicate explicit SPARQL to model allocation:

from dataclasses import dataclass
from pydantic import BaseModel

@dataclass
class from_sparql:
    binding: str

class Work(BaseModel):
    name: Annotated[str, from_sparql(binding="work_name")]

class Person(BaseModel):
    name: Annotated[str, from_sparql(binding="name")]
    work: Work

A call to

person = instantiate_model_from_kwargs(
    Person, name="Person Name", work_name="Work Name"
)

currently results in

{'name': 'Person Name', 'work': {'name': 'Person Name'}}

because work_name does not match a model field.

The feature should make it so that the above call to instantiate_model_from_kwargs results in

{'name': 'Person Name', 'work': {'name': 'Work Name'}}.

b1rger commented 3 weeks ago

Why use a dataclass and not stimply inherit from str?

class Binding(str):
    pass

class Work(BaseModel):
    name: Annotated[str, Binding("work_name")]

?

lu-pl commented 3 weeks ago

Why use a dataclass and not stimply inherit from str?

class Binding(str):
    pass

class Work(BaseModel):
    name: Annotated[str, Binding("work_name")]

?

Also an option, I just found from_sparql + binding kwarg neat because it almost feels like natural language. But any runtime checkable type will do.

I am in favor of the str subclass.

kevinstadler commented 3 weeks ago

Just thinking ahead, what would be the syntax for explicit bindings of array values?

#1
class Person(BaseModel):
    name: str
    works: list[Annotated[str, from_sparql(binding="work_name")]]

#2
class Person(BaseModel):
    name: str
    works: Annotated[list[str], from_sparql(binding="work_name")]

I guess 1 is more natural since it just collects the multiple results under the annotation label and then groups those results into an array, while 2 maybe is a binding to an actual SPARQL/TripleStore field of type list[str]?

lu-pl commented 3 weeks ago

Note that list[Annotated[str, "something"]] is not semantically equivalent to Annotated[list[str] "something"].

lu-pl commented 2 weeks ago

Closed through 2b47151