SwissDataScienceCenter / calamus

A JSON-LD Serialization Libary for Python
Apache License 2.0
29 stars 12 forks source link

Serialize nested object by id only #55

Open Panaetius opened 3 years ago

Panaetius commented 3 years ago

Currently when serializing with fields.Nested, we always serialize the whole object and add @type information.

In some cases, we want to instead serialize an @id reference instead of the whole nested object.

Theoretically this would be possible with something like fields.Nested(schema.address, AddressSchema, only=("_id",)), but that will result in something like {"@id": "...", "@type": "..."}, whereas for an id reference we wouldn't want to have the @type information included.

We could add an addition flag to fields.Nested to skip adding the @type but that's probably not a clean solution.

A new field like fields.IdReference(schema.address, AddressSchema) would make sense that still takes the child schema like fields.Nested but only returns the @id of the child. On deserializing, it should look up the @id to see if it can get that object in the data and if not, raise an exception.

mwx23 commented 3 years ago

I am having a similar issue when deserialising a nested object that only has the @id and no associated @type.

With data like this:

data = {'@id': 'http://example.com/id/abcd1234',
                  '@type': ['http://example.com/ont#Document'],
                  'http://example.com/ont#created_at': [
                      {'@type': 'http://www.w3.org/2001/XMLSchema#dateTime', 
                       '@value': '2021-01-20T10:01:34.98892000'}],
                  'http://example.com/ont#created_by': [
                      {'@id': 'http://example.com/id/user%40example.com',
                       }],
 }

The Document schema class

class DocumentSchema(JsonLDSchema):
    class Meta:
        rdf_type = EX.Document
        model = Document

    _id = fields.Id(default=generate_uuid_iri)
    created_at = fields.DateTime(EX.created_at)
    created_by = fields.Nested(
        EX.created_by,
        UserSchema,
        many=False,
    )

Then loading the example data:

d = DocumentSchema().load(data)

Gives this exception:

  File "/Users/marty/smf/hivemind-justpy/venv/lib/python3.9/site-packages/calamus/fields.py", line 563, in _load
    valid_data.append(self.load_single_entry(val, partial))
  File "/Users/marty/smf/hivemind-justpy/venv/lib/python3.9/site-packages/calamus/fields.py", line 540, in load_single_entry
    type_ = normalize_type(value["@type"])
KeyError: '@type'

The http://example.com/ont#created_by entity is of type http://example.com/ont#User but I do not receive that in the JSON-LD serialisation.

Panaetius commented 3 years ago

You might want to use the IRIField ( https://github.com/SwissDataScienceCenter/calamus/blob/master/calamus/fields.py#L193 ) for this.

This issue is more about dereferencing an @id reference. E.g. if you have a Nested reference in the Schema and the metadata looks like:

[
  {
    "@id": "http://example.com/mainobject",
    "@type": "http://schema.org/something",
    "nested_object": {"@id": "http://example.com/subobject"}
  },
  {
    "@id": "http://example.com/subobject",
    "@type": "http://schema.org/something_else",
  },
]

Then it should include subobject inside mainobject when deserializing, by dereferencing the @id. But only if subobject is also somewhere in the metadata.

This does currently already work in the case of flattened metadata (where no objects are nested and instead you just have a flat list), in which case you'd have to do DocumentSchema(flattened=True).load(data)

If we had some way of resolving @ids, we could also use that to dereference something like http://example.com/id/user%40example.com, e.g. if an HTTP GET request to the @id returned jsonld for that object, or maybe if a user's Schema could provide a custom function to dereference it. As it is in your example, we really wouldn't have a way to create a user object from a UserSchema just by its @id, unless it's also in data, since we wouldn't know where to get the data for the fields from.