DescribeRequest insufficiently specific for web ui's needs

GoogleCodeExporter commented 9 years ago

My personal camlistore instance returns 1.2MB of data for the default search 
the web UI performs.

The reason is because the client is asking for describe.depth=2. It needs depth 
2 for cases like a permanode pointing to an image - it wants to know the width 
and height of that image. But it is also getting information about all the 
members of potentially gigantic folders.

One idea for how to fix this:

type DescribeRequest struct {
...
+ []string Properties  // The properties the client would like to be described
+ []string Follow  // Properties that should be followed recursively - this is 
limited by Depth
...
}

Original issue reported on code.google.com by zbo...@gmail.com on 5 Jan 2014 at 2:50

GoogleCodeExporter commented 9 years ago

After implementing the foursquare custom renderer, I've realized that the above 
simple proposal is still insufficient.

The foursquare data has the following structure:

checkin (permanode) references via "foursquareVenue"
  venue (permanode) references via "photos"
    photos of venu (permanode) references via "camliMember"
      photo (file)

For each checkin returned by search, we want to return down to the depth of 
photo. But we don't want a depth:4 describe in general, nor do we want multiple 
requests.

It seems like each custom renderer in the UI needs to be able to participate in 
describe requests and specify its desires. But I don't know how this should be 
expressed.

I would imagine that whatever system this is should end up being just another 
aspect of search, usable by any client.

Original comment by zbo...@gmail.com on 14 Apr 2014 at 2:41

GoogleCodeExporter commented 9 years ago

OK, here is a new idea:

What if you could submit multiple search queries at once and each one could 
include placeholders that referenced previous queries.

For example, say a user goes to the web UI and types 
"attr:camliNodeType:foursquare.com:checkin".

We would want to submit that query to the server, along with the following 
additional queries:

1. Get all the venues
{
  relation: {
    relation: "parent",
    edgeType: "foursquareVenue",
    all: [
      {
        permanode: {
          // $0 indicates "any result from the 0th search query (e.g., the primary one, that the user typed)"
          // results in the context of these searches are always blobrefs
          blobRefPrefix: "$0"
        }
      }
    ]
  }
}

2. Get the "photos" folder for each venue
{
  relation: {
    relation: "parent",
    edgeType: "photos",
    all: [
      {
        permanode: {
          blobRefPrefix: "$1"
        }
      }
    ]
  }
}

3. Get the file blobs for all the photos of each venue
{
  relation: {
    relation: "parent",
    edgeType: "camliMember",
    all: [
      {
        permanode: {
          blobRefPrefix: "$2"
        }
      }
    ]
  }
}

There also still needs to be a way, for each of these searches, to say which 
attributes get returned. But that is a simpler problem. I think just a list of 
attribute names is all that is needed.

I think for this example it would be:

0. ["foursquareVenue", "startDate"]
1. ["photos"]
2. ["camliMember"]
3. ["width", "height"]

Original comment by zbo...@gmail.com on 14 Apr 2014 at 3:21

GoogleCodeExporter commented 9 years ago

yeah, I could also use something like that for the publish app. As long as the 
publish handler was "internal" it could keep a describe request handy and reuse 
it, so the server would know what's already been described/populated.
But I can't do that anymore when the publish handler is just a remote client, 
so I had to increase the describe depth as well, and try to send less describe 
requests.
So a request that would tell the server what I already have and that it doesn't 
need to be described again would help.

Original comment by mathieu....@gmail.com on 14 Apr 2014 at 3:43

GoogleCodeExporter commented 9 years ago

Some existing graph query syntaxes:

Levelgraph (a browser/nodejs based graph store) supports both a template based 
API

[{
      subject: db.v("x0"),
      predicate: 'friend',
      object: 'matteo'
    }, {
      subject: db.v("x0"),
      predicate: 'friend',
      object: db.v("x1")
    }]

and a traversal api.

Gremlin (Java) has a nice traversal based API:

    g.v(1).out('knows').as('x').out('bought').as('y').table(t)

Of course they're both triple-oriented stores, but camli metadata can be 
thought of as a graph.

https://github.com/mcollina/levelgraph
http://markorodriguez.com/2011/06/15/graph-pattern-matching-with-gremlin-1-1/

Original comment by ericd...@gmail.com on 14 Apr 2014 at 3:51

GoogleCodeExporter commented 9 years ago

Note that in those examples the fields to be included in the results are also 
specified. Each result enumerates a graph fragments which satisfy the query, 
specifying the value for each free variable.

Original comment by ericd...@gmail.com on 14 Apr 2014 at 3:57

GoogleCodeExporter commented 9 years ago

Thanks for the links ericdrex. Interesting ideas.

Original comment by zbo...@gmail.com on 14 Apr 2014 at 5:45

GoogleCodeExporter commented 9 years ago

I think it's time to kill the depth field (the integer) altogether.

The client should provide all paths it cares about.

Let's focus on figuring out a proposal for the client to specify which 
properties/paths it wants described. Probably a new DescribeFoo struct / JSON 
object with a repeated list of DescribeBar objects.  The common case will be 
hopping between permanode attributes/values, but there might be other things 
too.

Proposals welcome.

Original comment by bradfitz on 15 Apr 2014 at 4:00

GoogleCodeExporter commented 9 years ago

This is what I was trying to describe (heh) above, but in code:

https://camlistore-review.googlesource.com/2596

Original comment by zbo...@gmail.com on 15 Apr 2014 at 5:09

GoogleCodeExporter commented 9 years ago

After thinking about this a couple days and running it by Dan Erat in person 
yesterday, I hacked up 20eca7aad0422 today (now submitted), which adds 
"describe rules" which run after the depth integer's expands things.  The 
integer will be removed later, but this is a transitional step.

The describe search request can now be either a GET (as it was before) or a 
POST (in which case the body is the JSON *search.DescribeRequest).

But since a SearchQuery was already a POST and included a DescribeRequest, 
existing search queries can now also do the fancier describes if they include 
the "Rules" (JSON: "rules") field.

I'll keep this bug open until the integer is dead.

Aaron, can you see if the existing code/docs are self-explanatory enough and 
try to adapt the UI to use it?  Then dial back the describe depth from 4 to 2 
(or all the way to 0) and instead use rules.

I consider this a blocker for the 0.8 release, since we want Foursquare in the 
UI for 0.8, but we also want good performance.

Original comment by bradfitz on 20 Apr 2014 at 1:08

Added labels: Release-0.8

GoogleCodeExporter commented 9 years ago

I have not tried it yet, but I believe that 20eca7aad0422 is insufficient.

Consider the case of a search result that returns a permanode representing a 
large dynamic set and also returns at least one Foursquare checkin.

We do *not* want to describe every camliMember of the large dynamic set. That 
was the problem in comment #1. However, in order to properly render the FS 
checkin, we *do* need to describe the camliMembers of the permanode containing 
the photos for the associated venue. The permanode whose members we want to 
describe doesn't have its own camliType, and its not a root, so there's no way 
for a rule to target it.

Original comment by zbo...@gmail.com on 20 Apr 2014 at 4:03

GoogleCodeExporter commented 9 years ago

Confirmed: https://camlistore-review.googlesource.com/2647

I could solve this by giving all the problematic permanode a camliNodeType, but 
it seems wrong to have to do that.

Below are two other issues I noticed while I was integrating this. You might 
just consider these TODOs, but I wanted to note them in case you hadn't thought 
about them:

1. There is no way to limit what attribute values are returned. The queries 
return way more data than the web UI needs. The web UI really only needs the 
"title", "image", and "file" properties in the result set of most nodes. And 
not even all of those. If there was some way to specify this, I could probably 
reduce the response sizes by 50%.

2. There is no way to target camliType other than 'permanode'. This would be 
necessary to control whether static sets and directories are described or not.

Original comment by zbo...@gmail.com on 20 Apr 2014 at 4:40

GoogleCodeExporter commented 9 years ago

I believe this is now sufficiently fixed.

We could do more (specifying which attributes are interesting for describe) in 
a future bug. I've opened Issue 435 for that.

commit 384b627b5ef652c4ee31ba4a6340468237442ae6
Author: Brad Fitzpatrick <brad@danga.com>
Date:   Sun May 4 20:03:00 2014 -0700

    ui: reduce describe depth, using recursive describe instead

    Updates issue https://code.google.com/p/camlistore/issues/detail?id=319

    Change-Id: Ie02b0f565c6ff4c9582cecc78914392a60bf9502

commit bf39d559412195a7fb2bcdea9b0c5cb99e5ac780
Author: Brad Fitzpatrick <brad@danga.com>
Date:   Sun May 4 22:41:08 2014 -0400

    search: recursive describe rules

    Updates https://code.google.com/p/camlistore/issues/detail?id=319

    Change-Id: I7ef0a0df28e306eaae969e07d9ccf1e7346316ef

Original comment by bradfitz on 5 May 2014 at 2:44

Changed state: Fixed

GoogleCodeExporter commented 9 years ago

This issue has moved to https://camlistore.org/issue/319

Original comment by bradfitz on 14 Dec 2014 at 11:36

Added labels: IssueMoved, Restrict-AddIssueComment-Commit

google-code-export / camlistore

DescribeRequest insufficiently specific for web ui's needs #319