BlueBrain / nexus

Blue Brain Nexus - A knowledge graph for data-driven science
https://bluebrainnexus.io/
Apache License 2.0
276 stars 74 forks source link

Resource containing a graph with multiple roots is not properly handled #2210

Closed imsdu closed 3 years ago

imsdu commented 3 years ago

When we try to create a resource which is a graph with multiple roots such as:

{
  "@context": "https://bbp.epfl.ch/nexus/v1/resources/covid19-kg/schemas/context",
  "@graph": [
    {
      "@type": "Dataset",
      "contribution": {
        "@type": "Contribution",
        "agent": {
          "@type": "Agent"
        },
        "hadRole": "Scientists"
      },
      "distribution": {
        "@type": "DataDownload",
        "contentSize": {
          "value": 759071,
          "unitCode": "bytes"
        },
        "contentUrl": "https://staging.nexus.ocp.bbp.epfl.ch/v1/files/covid19-kg/data/a5587fba-e56d-4af9-8e53-c8b228370c7b",
        "digest": {
          "value": "4605ae01485669cb995004091212a13d8a195f659fc822fbbc3fb80cf603b81d",
          "algorithm": "SHA-256"
        },
        "encodingFormat": "application/x-turtle",
        "name": "kg_20200720-031625.ttl"
      },
      "hasPart": [
        {
          "@id": "https://staging.nexus.ocp.bbp.epfl.ch/v1/resources/covid19-kg/data/_/80bd99cc-d673-43c7-bb44-f820e3d3df0b?rev=1",
          "@type": "Dataset",
          "distribution": {
            "@id": "_:b7",
            "contentUrl": "https://staging.nexus.ocp.bbp.epfl.ch/v1/files/covid19-kg/data/91ab7df8-d253-4117-a5f3-5532ca21c22f"
          },
          "name": "A dataset"
        },
        {
          "@id": "https://staging.nexus.ocp.bbp.epfl.ch/v1/resources/covid19-kg/data/_/3e63eb1a-bc86-4a1a-86a3-e3764b295047?rev=1",
          "@type": "Dataset",
          "distribution": {
            "@id": "_:b0",
            "contentUrl": "https://staging.nexus.ocp.bbp.epfl.ch/v1/files/covid19-kg/data/2939b95e-ba33-42d2-8772-64d582554d34"
          },
          "name": "A dataset"
        }
      ],
      "name": "A dataset"
    },
    {
      "@id": "https://staging.nexus.ocp.bbp.epfl.ch/v1/resources/covid19-kg/data/_/3e63eb1a-bc86-4a1a-86a3-e3764b295047?rev=1",
      "@type": "Dataset",
      "distribution": {
        "@id": "_:b0",
        "contentUrl": "https://staging.nexus.ocp.bbp.epfl.ch/v1/files/covid19-kg/data/2939b95e-ba33-42d2-8772-64d582554d34"
      },
      "name": "A dataset"
    },
    {
      "@id": "https://staging.nexus.ocp.bbp.epfl.ch/v1/resources/covid19-kg/data/_/80bd99cc-d673-43c7-bb44-f820e3d3df0b?rev=1",
      "@type": "Dataset",
      "distribution": {
        "@id": "_:b7",
        "contentUrl": "https://staging.nexus.ocp.bbp.epfl.ch/v1/files/covid19-kg/data/91ab7df8-d253-4117-a5f3-5532ca21c22f"
      },
      "name": "A dataset"
    }
  ]
}

Then the id of the nodes is picked up as the root id and it clashes with the provided id:

{"@type":"UnexpectedResourceId","reason":"Resource 'https://bbp.epfl.ch/neurosciencegraph/data/db979680-7d0b-48a9-a3c3-663f145c8467' does not match resource id on payload 'https://staging.nexus.ocp.bbp.epfl.ch/v1/resources/covid19-kg/data/_/80bd99cc-d673-43c7-bb44-f820e3d3df0b?rev=1'."}
umbreak commented 3 years ago

In 1.4.x @graph was not well supported. It wasn't supported at all when retrieving resources, since we do framing on them and It wasn't well supported when managing resources in the knowledge graph because we depend on a named graph for it.

In this example you gave, when retrieving the resource you just get back the metadata (with nothing the client pasted):

curl -s -H "Authorization: Bearer $TOKEN" "https://staging.nexus.ocp.bbp.epfl.ch/v1/resources/reimann/topological_sampling/_/https%3A%2F%2Fbbp.epfl.ch%2Fneurosciencegraph%2Fdata%2Fdb979680-7d0b-48a9-a3c3-663f145c8467" | jq

Same happens when you query the common blazegraph namespace for that project:

curl  -H "Content-Type: application/sparql-query" -H "Authorization: Bearer $TOKEN" "https://staging.nexus.ocp.bbp.epfl.ch/v1/views/reimann/topological_sampling/graph/sparql" -d 'SELECT * WHERE {GRAPH <https://bbp.epfl.ch/neurosciencegraph/data/db979680-7d0b-48a9-a3c3-663f145c8467/graph> { ?s ?p ?o } }'

So that resource, from the point of view of nexus, is the same as having a resource with an empty payload.

In v.1.5.x what we support (I have to verify this) is the creation of a named graph.. For example:

POST /v1/resources/{org}/{proj}
{
  "@context": {
    "@vocab": "http://schema.org/"
  },
  "@id": "http://example.com/graph-id",
  "@graph": [
    {
      "@id": "http://example.com/a",
      "a": 2
    },
    {
      "@id": "http://example.com/b",
      "b": 2
    }
  ]
}

Will work mostly as expected. Notice that:

1) The metadata fields get attached to the top level @id. That means that the metadata is outside the named graph {id} 2) During indexing on a Blazegraph namespace, we... 1) Create another named graph, the management named graph {id}/graph. That's the way how we deal with triples lifecycle on a Blazegraph namespace. 2) Get all the existing triples from the resource (inside and outside the existing named graph) 3) Insert them inside the management named graph {id}/graph 4) As a consequence, the N-QUADS representation of the resource is not the same once it reaches Blazegraph, since we need the management named graph. However the N-TRIPLES representation stays the same.

imsdu commented 3 years ago

Such resources also exist in production data.