NoLiD / Disquo

Data comparator for humans.
1 stars 1 forks source link

Get predicate intersection of selected things in graph-view #13

Closed kdbanman closed 9 years ago

kdbanman commented 9 years ago

photo

kdbanman commented 9 years ago

The queries shown in the picture for "Features" don't get enough information to build the graph. The incoming predicates all get clumped together, and the outgoing predicates all get clumped together. To construct the edges in the graph, I need to know about direction for each predicate/selected-resource pair. For example:

I need to know that Influenced is in incoming and outgoing for Knuth, but only outgoing for Einstein.

Right now all I know is that Influenced is in incoming and outgoing for the union of selected resources (Einstein and Knuth).

See the current master HEAD efe313e85886f13d375d8acf023aaf1733d0677a in app/components/graph-view.js and app/adapters/predicate.js

kdbanman commented 9 years ago

The sparql should probably be a union over the selected uris (still different queries for incoming and outgoing predicates):

Like this query, as executed here, except the results table doesn't yet include the subject resources.

SELECT * {
    ?pred rdfs:label ?label .
    {
        SELECT DISTINCT <http://dbpedia.org/resource/Albert_Einstein> ?pred ?label {
            <http://dbpedia.org/resource/Albert_Einstein> ?pred [].
            ?pred rdfs:label ?label
        } LIMIT 3
    } UNION {
        SELECT DISTINCT <http://dbpedia.org/resource/Donald_Knuth> ?pred ?label {
            <http://dbpedia.org/resource/Donald_Knuth> ?pred [].
            ?pred rdfs:label ?label
        } LIMIT 3
    }
}
kdbanman commented 9 years ago

Woot, way simpler to template with handlebars. Also it's correct instead of not correct:

SELECT DISTINCT ?sub ?outpred ?label
{
  VALUES ?sub { <http://dbpedia.org/resource/Albert_Einstein> <http://dbpedia.org/resource/Donald_Knuth> }
  ?sub ?outpred [] .
  ?outpred <http://www.w3.org/2000/01/rdf-schema#label> ?label .
}

Which corresponds to the handlebars template with context=selection (see app/adapters/predicate.js, Adapter's Outgoing property):

SELECT DISTINCT ?subject ?predicate ?label WHERE { VALUES ?subject { {{#each selected}} <{{this}}> {{/each}} } ?subject ?predicate [] . ?predicate {{label}} ?label . }
kdbanman commented 9 years ago

@KarimBaaba What do you think? This looks long, but the bulk of it is code that you already wrote, with a few proposed changes marked by the comments.

I don't think the the Select query from models/queries/select.js needs to be very heavily modified to get the results from this query. This query returns the same kind of results as the other Select queries, just with an extra context column that says which member of the selection the predicate belongs to.

The usage pattern in adapters/predicate.js could be:

  Outgoing: Query.extend({context: 'subject',     // <-- CHANGE HERE!
                                          variable: 'predicate',
                                          template: 'SELECT DISTINCT ?subject ?predicate ?label WHERE { VALUES ?subject { {{#each selected}} <{{this}}> {{/each}} } ?subject ?predicate [] . ?predicate {{label}} ?label . }'}),

  Incoming: Query.extend({context: 'object',      // <-- CHANGE HERE!
                                          variable: 'predicate',
                                          template: 'SELECT DISTINCT ?object ?predicate ?label WHERE { VALUES ?object { {{#each selected}} <{{this}}> {{/each}} } [] ?predicate ?object . ?predicate {{label}} ?label . }'}),

By checking for presence of the context property in the Select query, existing usage isn't broken. Notice when the Resource object is created/appended to - the context uri is just added as a plain property. This feels hacky. Maybe we should create a GraphResource that extends Resource with a 'context' property?

export default Query.extend({

  init: function() {
    this._super();
    this.set('resourcesMap', Ember.Map.create());
    this.set('resourcesArray', Ember.A());
  },

  result: function() {
    var jsonToResult = Jassa.service.ServiceUtils.jsonToResultSet;

    return this._super()
              .then(jsonToResult);
  }.property('query'),

  resultToResources: function(result) {
    var row, entry, context, uri, label, lang,      // <-- CHANGE HERE!  (context)
    contextVar = this.get('context')      // <-- CHANGE HERE! 
    variable = this.get('variable'),
    map      = this.get('resourcesMap'),
    arr      = this.get('resourcesArray');

    while (result.hasNext()) {
      row   = result.nextBinding();
      uri   = row.varNameToEntry[variable].node.uri;
      if (contextVar) context = row.varNameToEntry[contextVar].node.uri;  // <-- CHANGE HERE! 
      label = row.varNameToEntry.label.node.literalLabel.val;
      lang  = row.varNameToEntry.label.node.literalLabel.lang;

      if ((entry = map.get(uri))) {
        entry.addLabel(label, lang);
        if (context) entry.context = context;  // <-- CHANGE HERE! (modify Resource instead?)
      } else {
        entry = Resource.create({uri: uri});
        entry.addLabel(label, lang);
        if (context) entry.context = context;  // <-- CHANGE HERE!
        map.set(uri, entry);
      }
    }

    map.forEach(function(key, value) {
      if (!arr.contains(value)) { arr.pushObject(value); }
    });
    return arr;
  },

  resultToComments: function(result) {
    var row, comment, lang;
    var resource = this.get('resource');

    while (result.hasNext()) {
      row     = result.nextBinding();
      comment = row.varNameToEntry.comment.node.literalLabel.val;
      lang    = row.varNameToEntry.comment.node.literalLabel.lang;

      resource.addComment(comment, lang);
    }

    if (!resource.get('comments.length')) {
      resource.addComment('This resource has no description');
    }
  }
});
kdbanman commented 9 years ago

So the choice of name context was _terrible_, but the proposed changes work. See branch graph-mods. :-)

xkb commented 9 years ago

The name context is really meant for the context of handlebars expressions. Passing var context = {context: blah, selected: selected} is still valid and the name is arbitrary.

The solution you provided works perfectly except for if (context) entry.context = context; This will overwrite previously written 'contexts' to the given entry. Example:

Subject Predicate ...
A B ...
A C ...
A D ...
E F ...
E B ...

The returned result is a list of two Resources A and E with A.context = D and B.context = E. All other predicates are thrown away. In other words, Resources need to store predicates as list.

Which my brings us to my problem with this solution. With a minor adjustment this will work fine but lets say a few months later we want to add a query that starts with

SELECT ?var1 ?var2 ?var3 ?var4 ...

We need a solution that solves any arity. Since we can always assume that our final result is a list of Resource objects, there a few clever things we can do by treating all results as an unmapped list of resources.

I'll try working on a partial solution and have it pushed by Friday.

kdbanman commented 9 years ago

Thanks for the details. I was pretty sure you'd want to generalize to any number of columns.

Just one correction, the returned result is the reverse of what you said. You're right about the overwriting behaviour, but the returned result is not two subject Resources. The Resources are the predicates. The context is the subject URI (just a string):

B.context == E    // <-- OVERWRITE
C.context == A
D.context == A
...
kdbanman commented 9 years ago

This is a version of my changes that work and don't overwrite. It's committed to branch graph-mods:

  resultToResources: function(result) {
    var row, entry, ctx, uri, label, lang,
    ctxVar   = this.get('ctx'),           //////////// <-- 'context' is now 'ctx'
    variable = this.get('variable'),
    map      = this.get('resourcesMap'),
    arr      = this.get('resourcesArray');

    while (result.hasNext()) {
      row   = result.nextBinding();
      uri   = row.varNameToEntry[variable].node.uri;
      if (ctxVar) { ctx = row.varNameToEntry[ctxVar].node.uri; }
      label = row.varNameToEntry.label.node.literalLabel.val;
      lang  = row.varNameToEntry.label.node.literalLabel.lang;

      if ((entry = map.get(uri))) {
        entry.addLabel(label, lang);
      } else {
        entry = Resource.create({uri: uri});
        entry.addLabel(label, lang);
        map.set(uri, entry);
      }
      if (ctx) {         //////////// <-- 'ctx' is an array of unique context uri strings
        if (!entry.get('ctx')) { entry.set('ctx', Ember.A()); }
        if (!entry.get('ctx').contains(ctx)) { entry.get('ctx').push(ctx); }
      }
    }

    map.forEach(function(key, value) {
      if (!arr.contains(value)) { arr.pushObject(value); }
    });
    return arr;
  },
xkb commented 9 years ago

I was looking at this the other way around which is treating ctx as the resource and some how adding predicates to it. The order matters a little bit mainly because of the way I envision the result.

Instead of returning a list of predicates with their corresponding 'selected' resource why not return a list of what the user selected with in/out predicates fetched.

  1. User selects A, B , C and predicate PlacesLived
  2. Fetch PlacesLived of A, B, C
  3. Return [A, B, C]
  4. but now each subject has predict values. ie: A.PlacesLived.in, A.PlacesLived.out

Also this let's make A.PlacesLived.in a list of Resource objects and have some what of a relational mapping going since everything is a resource by nature.

How do you feel about this? The biggest issue with implementation for any number of select variables is dealing with labels.

What would queries look like?

select keyVar keyLbael(optional) var1 var1Label var2 var2Label.......

This would return a single Resource keyVar and inner proprieties of Resource lists (var1, .., varN)

kdbanman commented 9 years ago

This sounds good. Just two things.

Thing one

I like where you're going, but having a javascript object property named according to query results (A.**PlacesLived**.in) feels awkward. It should just be two lists of Resources: A.in and A.out.

Your example is for the second graph view, with the actual values of the selection + predicate. Here's the example using the same system for the first graph view (just predicates):

  1. User selects A, B, C from Things
  2. Fetch predicates of A, B, C
  3. Return [A, B, C]
  4. Each of A,B,C has two lists Resources - the predicates. ie: A.in, A.out

Thing two

let's make A.in a list of Resource objects

Does our Resource object gracefully accommodate literal values?

kdbanman commented 9 years ago

And keep in mind that the branch graph-mods does work already, if you feel like have higher priority things to tackle before we deploy.

xkb commented 9 years ago

Forgot about the other branch. I didn't want to deal with merge conflicts so my changes are now on master. Have a look at the my latest commit 8e9dc78b8856a82554b3d69ed08bf0df0df680d5 to see what I added.

Because the first graph view is executing two independent queries, there isn't an easy way to produce a single list of resources that has in/out properties. Meaning, we can't have a list of Resources of As with A.in, A.out without a major pain in the ass.

Instead there are two different results from each query. Each result is a list of resources(what the user selected) with new/added properties based on the query declaration. These properties are maps NOT arrays! Have a look at the updated adapters for reference, pretty nifty stuff happening.

Literals

As of right now a select query that selects any literals(except labels) will break the app. Every non-label variable is decoded with:

uri = row.varNameToEntry[key.var].node.uri;

It isn't difficult to add literals support but I'm going to wait until we figure out what will happen with Jassa.

Btw, my editor pretty much rewrote graph-view.js in my last commit. Please make sure you're using EditorConfig with your editor.

kdbanman commented 9 years ago

Good call on using maps instead of arrays. The select query code looks simpler now than before any of this went down. Nice work.

And thanks for the usage example in graph-view - made my life a lot easier.

Now for labels:

I thought about getting the key labels in the same query like this, but the number of rows from the SELECT DISTINCT gets stupid:

Outgoing: Query.extend({ template: 'SELECT DISTINCT ?subject ?subjectLabel ?predicate ?label WHERE { VALUES ?subject { {{#each selected}} <{{this}}> {{/each}} } ?subject ?predicate [] . ?predicate {{label}} ?label . } . ?subject {{label}} ?subjectLabel',
                          key: {var: 'subject', label: 'subjectLabel'},
                          variables: [ {var: 'predicate', label: 'label', mapName: 'outPredicates'} ] })

The best way I can think of getting those key labels is to do a separate query in the adapter:

KeyLabels: Query.extend({
  template: 'SELECT DISTINCT ?key ?label WHERE {
                    VALUES ?key { {{#each selected}} <{{this}}> {{/each}} }
                    ?key {{label}} ?label
                  }',
  key: { var: 'key', label: 'label' }
});
kdbanman commented 9 years ago

I just implemented the changes. It's on master and it works. Now to shove that stuff into cytoscape.

xkb commented 9 years ago

Query looks good. It will eventually need to be moved out of the adapter and into Store for other adapters to use it and get some caching happening since all selected labels have already been fetched before.

For now just it leave it, will address it after we deal with literals and cache comments fetched inside Store.