TREEcg / specification

RDF vocabulary and hypermedia specification to publish your Linked Data using search trees
https://w3id.org/tree/specification
27 stars 12 forks source link

The Member extraction algorithm #71

Closed pietercolpaert closed 8 months ago

pietercolpaert commented 1 year ago

This rather large issue proposes to:

  1. [x] clearly define the tree:Member class,
  2. [x] Clear out the explanation of tree:member: it refers to a topic, not to a tree:Member,
  3. [x] define the member extraction algorithm as part of the spec,
  4. [x] clear out triggers for an HTTP request
  5. [x] introduce named graph support

Related: https://github.com/SEMICeu/LinkedDataEventStreams/issues/37

Pull request: #78

Follow the discussions and presentations on the mailing list: https://www.w3.org/community/treecg/

xdxxxdx commented 1 year ago

Hello @pietercolpaert , Member dereferencing Please explain why the members needs to be dereference here? Thanks

pietercolpaert commented 1 year ago

Hello @pietercolpaert , Member dereferencing Please explain why the members needs to be dereference here? Thanks

Take for example the case of Marine Regions: https://marineregions.org/feed

In their implementation, they only foresee a list of members and when they changed. If you want the contents of the members, you need to dereference them. I’d like to add a property to make sure this can be indicated to the client that one extra HTTP request per member will be needed.

pietercolpaert commented 1 year ago

After the W3C TREE CG meeting of 2023-05-24:

  1. A tree:Member is a set of triples. This member is contained in a collection. The set of triples that are part of the member is defined by the member extraction algorithm.
  2. The explanation of tree:member was already quite okay, the spec just needs some editing work. tree:member refers to the primary topic, except for in the case of point 5.
  3. The member extraction algorithm - see below
  4. Some discussions arose on this one and this will be continued in the call of 2023-06-07
    • Instead of just a dereferenceMember boolean flag, we could also think about a deferencePath that indicates a property path to a named node that needs to be dereferenced if you want to get to a complete member. The dereferenceMember flag could then be realized with an empty list as the object of tree:derefencePath.
    • @bergos commented: Is this needed at all? Can’t we use on the one hand the ideas behind CBD, and on the other hand use the SHACL shape to understand whether the triple set is complete?
    • @pietercolpaert’s reply on this: I see some problems with that approach with optional properties, but preparing an example of this for the next call so we can continue this discussion.
      1. Instead of a boolean property, we are opting for typing and introducing a class: tree:NamedGraphCollection that adds explicit semantics to the named graph wrt the member extraction algorithm.
      2. To be discussed on a next call

The base member extraction algorithm

Find all triples with the member URI as the subject and then repeat this for every named node and blank node that has been found in the object, except for subjects that have already been processed, and except for other members in the collection.

let Subjects = getMemberUris(triples);
members = [];
for (s of Subjects) {
      members.push(extractMember(triples, s, processedSubjects, Subjects));
}

//Recursive function
function extractMember (T, s, processedSubjects, Subjects) {
    processedSubjects.push(s); //This will prevent cycles
        member = [];
        for (t of T) {
               if (t.subject.value == s) {
                member.push(t);
                if (t.object.termType !== 'Literal'  && !processedSubjects.contains(t.object.value)  && !Subjects.contains(t.object.value)) {
                        member.concat(extractMember (T, t.object.value,processedSubjects,Subjects);
                    }
                }
          }
       return member;
}
bergos commented 1 year ago

I created an example where CBD and the SHACL shape would extract the same triples. Below is the data, but you can also play with it on the SHACL Playground.

Data

@prefix ex: <http://example.org/>.
@prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#>.

ex:resource1
  ex:property1 [
    ex:property2 ex:resource2
  ].

ex:resource2
  ex:property3 "test".

Shape

@prefix ex: <http://example.org/>.
@prefix sh: <http://www.w3.org/ns/shacl#>.

ex:resource1Shape a sh:NodeShape;
  sh:name "resource 1 shape";
  sh:targetNode ex:resource1;
  sh:property [
    sh:name "property 1";
    sh:path ex:property1;
    sh:node ex:property2Shape
  ].

ex:property2Shape a sh:NodeShape;
  sh:name "property 2 shape";
  sh:property [
    sh:name "property 2";
    sh:path ex:property2
  ].

Extract

<http://example.org/resource1>
  <http://example.org/property1> [
      <http://example.org/property2> <http://example.org/resource2>
    ].

CBD

CBD stops after the triple with ex:resource2 as an object because named node objects are not traversed.

Shape

The SHACL requires understanding sh:property, sh:path, and sh:node. Adding constraints would make things more complicated. I think they should be explicitly excluded from the logic.

pietercolpaert commented 1 year ago

From the TREE CG Call:

To motivate: if we choose CBD by default: what are the use cases for any specializations?

pietercolpaert commented 10 months ago

Over July-August, it became clear this the way to go forward on this issue. Will adapt the description a bit.

pietercolpaert commented 8 months ago

Has been published in the latest version