Implement JSON-LD Integration Across Services

maxyzli commented 3 weeks ago

Library Service:

[x] Introduce and implement the JSON-LD schema.

Index Service:

[x] Test: ensure that the JSON-LD validation at the /validate route functions correctly.
[x] Test: verify that parsing JSON-LD data does not alter the profile hash.

DataProxy Services:

[x] Process incoming data into JSON-LD format.
[x] Validate JSON-LD data.
[x] Store validated data and send to Index.

geoffturk commented 3 weeks ago

Murm Services currently takes in data from the following sources:

KVM via their API
CSV files via an import from Tools

That data is then transformed into JSON format and validated against the schema(s) linked to it, and then stored in the dataproxy.

For example, using the people_schema-v0.1.0 schema, the following data:

name,geolocation.lat,geolocation.lon
Geoff Turk,48.88111,2.38296

... would be transformed into the following JSON:

{
  "linked_schemas": ["people_schema-v0.1.0"],
  "name": "Geoff Turk",
  "geolocation": {
    "lat": 48.88111,
    "lon": 2.38296
  }
}

... and validated against the schema.

We want to take that JSON output of the profile and transform it into JSON-LD format:

{
  "@context": "https://library.murmurations.network/jsonld/people_schema.jsonld",
  "@type": "Person",
  "name": "Geoff Turk",
  "geolocation": {
    "@type": "GeoCoordinates",
    "lat": 48.88111,
    "lon": 2.38296
  },
  "linked_schemas": ["people_schema-v0.1.0"]
}

Rather than repeatedly including the JSON-LD context in each profile, we want to link to the context from the library.

https://library.murmurations.network/jsonld/people_schema.jsonld

{
    "@context": {
      "@vocab": "https://schema.org/",
      "murm": "https://murmurations.network/ns/",
      "name": "name",
      "geolocation": "location",
      "lat": "latitude",
      "lon": "longitude",
      "linked_schemas": "murm:linkedSchemas"
  }
}

Note the above context is just an example. The full context still needs to be mapped out and added to the library.

One thing to point out is that the @type field is not included in the context, but is instead included in the profile data. This is because the @type field has to be located within the data file; it can't be stored in the context.

To work around this, we can embed the type in the metadata of the schema or field that requires a @type declaration. For example, in the people_schema-v0.1.0 schema, the @type field can be declared as:

{
  "metadata": {
    "@type": "Person",
    "@context": "https://library.murmurations.network/jsonld/people_schema.jsonld",
    "schema": {
      "name": "people_schema-v0.1.0",
      ... etc ...
    }
  }
}

And the geolocation field can be declared as:

{
  "metadata": {
    "@type": "GeoCoordinates",
    "field": {
      "name": "geolocation",
      "version": "1.0.0"
    },
    ... etc ...
  }
}

There would need to be a preprocessing step to add the @type field to the profile data based on the metadata. Note also the inclusion of the @context field in the schema metadata, which would also need to be added to the profile during processing.

This is just one possible way to implement the JSON-LD transformation. It's worth exploring other options to see if there are any better approaches.

maxyzli commented 2 weeks ago

Based on the discussed content, the designed steps are as follows:

Add a JSON-LD route to the library and use the same schemaparser to import JSON-LD files into the Library Service. Ideally, each schema will have a corresponding JSON-LD.
Place the JSON-LD information into the metadata of the existing schemas.
Generate the JSON-LD obtained from CSV and KVM files using the context and type from the schema’s metadata. Then store it within the profile.
Ensure that the JSON-LD files can be read normally when placed in the index. Data tagging needs to be cleared so that the hash values remain unchanged.

maxyzli commented 2 weeks ago

This is just one possible way to implement the JSON-LD transformation. It's worth exploring other options to see if there are any better approaches.

Question: If schema.org doesn’t provide the fields we need, such as murm:linkedSchemas, which we plan to place under the domain https://murmurations.network/ns/, should we create a separate repository to manage this? Or do you have another approach in mind for implementing this?

We will need to add a context.jsonld file at https://murmurations.network/ns/context.jsonld. Here is an example:

{
  "@context": {
    "@vocab": "https://schema.org/",
    "murm": "https://murmurations.network/ns/",
    "linkedSchemas": {
      "@id": "murm:linkedSchemas",
      "@type": "@id",
      "description": "A list of schemas against which a profile must be validated (schema names must be alphanumeric with underscore(_) spacers and dash(-) semantic version separator, e.g., my_data_schema-v1.0.0)"
    }
  }
}

geoffturk commented 1 week ago

should we create a separate repository to manage this?

My current thinking is that we should separate the context from validation, so yes, we should create a separate repo for context-related info, which can be deployed independently from changes to validation parameters. I'll set this up once I get feedback from @olisb on what to name it (either with a third level domain or something in the path of the root domain).

MurmurationsNetwork / MurmurationsServices

Implement JSON-LD Integration Across Services #844