benel / XanaduReloaded

1 stars 1 forks source link

Bi-directionnal links #1

Open benel opened 8 years ago

Slals commented 8 years ago

Specifications

A Xanadu link is a connective unit, a package of connecting or marking information. It is owned by a user. It is put in by a user, and thereafter maintained by the back end through the back end's inner indexing mechanisms.

Every link has an address in at least one document. These are its home documents, where it either originated or has been included by virtual copy. The original home document of a link is called its native document, the place it was created.

The front end has no access to the link's internal mechanisms or raw data, but only to its behavior as defined by the FEBE protocol.

Slals commented 8 years ago

Implementation

In order to give a possible solution we will use CouchDB. CouchDB allows to do two ways links pretty easily and respects the philosophy of Xanadu : everything is a document.

Document scheme

{
  "_id": "A",
  "_rev": "123",
  "data": "document_data"
}

Links

As T. Nelson envisioned a link has a native document which is its native home. A link can be in any document depending where the user created it. With that said, a link from A to B could live in a document X. A document can "store" one or more links as follow

{
  "_id": "X",
  "_rev": "234",
  "data": "Hey, I'm document X",
  "links": [
      {
         "from": "A",
         "to": "B"
      },
      {
         "from": "N",
         "to": "X"
      }
   ]
}

CouchDB View

function(doc) {
  if(doc.links) {
    doc.links.forEach(function(link) {
      emit([link.from, 1], { _id: link.to });
      emit([link.to, 1], { _id: link.from });
    });
  }
  emit([doc._id, 0], doc.text);
}

The two way link is made thanks to the two emits inside the forEach loop. The last emit is about emitting all documents.

In the key we use an integer to force CouchDB to order its result by the principe of View collation.

Putting an object { _id: id } for the links result will allow CouchDB to populate these doc and return their content instead of only their id. More details.

Query

Now to query the result for one document and its links, the URI for document A would be the following : /?include_docs=true&start_key=[A]&end_key=[A, 2]

benel commented 8 years ago

Nice.

Note: You can simplify [link.from, 1] with just link.from (and of course [link.to, 1] with link.to). Then, call the view with ?include_docs=true&start_key=["A"]&end_key=["A", {}]

Moreover, you don't need to emit ([doc._id, 0], doc.text), since include_docs should bring the whole document.

benel commented 8 years ago

Hmm. Sorry you were right. Using _id links prevents from linking to the original document. Therefore your solution was right. Don't consider my previous comment.

benel commented 8 years ago

To finish documenting this first step:

  1. You should first create a more complete and realistic test fixture (aka sample):
    • to get links both from and to the focused document,
    • with links hosted either in the source, destination or in a third party document,
    • with link types (see Literary Machines p. 4.52-55, list those that could be applied on complete documents, and choose a few for your sample);
  2. You should upgrade your implementation so that it let you know:
    • if the related document is the source or the destination of the link,
    • the type of the link;
  3. You should explain how your MapReduce request could be distributed (in theory) when documents are on different servers corresponding to different authors or communities (some types of links are usually created by the source document's author, some others are created by the the destination document's author, others are created by neither the source nor the destination's author).
Slals commented 8 years ago

Documentation


Sample

{
  "_id": "A",
  "_rev": "2-68f2e8a0ba3324d80a49b728982e3518",
  "data": "Hi document A here!",
  "links": [
    {
      "from": "C",
      "to": "A",
      "type": "correction"
    }
  ]
},
{
  "_id": "B",
  "_rev": "1-601eb6aa9ca5bdacc2efe436220f9d15",
  "data": "[Binary File] B"
},
{
  "_id": "C",
  "_rev": "3-0513f13a23fc954b6909e8f8f5a6f0d1",
  "data": "This is document C",
  "links": [
    {
      "from": "A",
      "to": "B",
      "type": "modal_jump"
    },
    {
      "from": "C",
      "to": "B",
      "type": "translation",
      "language": "frFR"
    }
  ]
}

Implementation

function(doc) {
  if(doc.links) {
    doc.links.forEach(function(link) {
      emit([link.from, 1], { _id: link.to, type: link.type, source: "from" });
      emit([link.to, 1], { _id: link.from, type: link.type, source: "to" });
    });
  }
  emit([doc._id, 0], doc.data);
}

Distributed approach

Finding any document in the docuverse could be done with a consistent hashing.

CoucheDB's storage model uses unique IDs to save and retrieve documents.

Thanks to this model it would be possible to create a hash system which locates any document by only knowing their ID. Moreover this approach allows to implements a tumbler span according to Xanadu's model.

benel commented 8 years ago

@Slaals

When I asked for "distribution", I did not mean "automatic distribution" (aka "cluster") but "natural distribution" related to organizations users belong to (similarly to the distribution of the Web into "sites").

The aim of my question is to match "distribution" as envisioned by Nelson and the theory of distribution of MapReduce.

Slals commented 8 years ago

@benel I'm not sure to understand. What do you mean by "natural distribution"?

In Nelson's document the chapter where he talks about "distribution" he points out the technical aspect of ditributing documents through multiple servers (or nodes), it makes me think of server clustering with MapReduce...

benel commented 8 years ago

In Nelson's document the chapter where he talks about "distribution" he points out the technical aspect of distributing documents through multiple servers (or nodes), it makes me think of server clustering with MapReduce...

It is similar technically ("view merging" is exactly what I want you to investigate), however the aim is totally different.

Let's take an example:

Resources about courses at UTT are distributed among "elearning.utt.fr" and "etu.utt.fr". Being on a server or the other is not dependent on "consistent hashing", they are not nodes of the same cluster you could use indiscriminately. They correspond to different communities with different access rights. Students wrote in the student forum because they could not write in the official university site (at least in the main description), or because they didn't want the faculty staff to read their comments.

Slals commented 8 years ago

Well, I'm still not sure to understand, despite reading over and over Xanadu documents I have. So I'll try an approach I find relevant to your question and your example.

@benel Those 2 quotes below are Xanadu model parts I find to be related to your example :

It is desirable for documents to carry information on how to show and manipulate them -- that is, general information for the front-end designer. Instructions to front ends for display and manipulation may be in the form of text explanations or programs

The fundamental operation of the Xanadu system is the request, usually a request for links (or their attached contents) fulfilling certain criteria. These criteria can become remarkably complex. The system is designed so that you can ask for certain types of links, and those pointing to and from certain places, with total flexibility. [...] Consider a typical command, the one for finding the number of links of a certain type. The command requires four endsets:

  • the home-set, those spans of the docuverse in which desired links are to be found;
  • the from-set, those spans of the docuverse wanted at the first side of the links;
  • the to-set, those spans of the docuverse wanted at the second side of the links;
  • the three-set, spans covering the types of link that are wanted in the request.

To outline, we have 4 parameters which could interest us for our MapReduce solution :

Given those 4 parameters each server could then enhanced their Map function (the one I pushed) to get documents with filtered three-set. Then, a Reduce function could be made in order to merge each document without their filtered links, and which adds a field that says what is/are the filter(s) used. Thus, each server would have its own way to present documents because of the free use of the three-set filter.

The spirit of Xanadu stipulates that a general information could exist in documents to tell front-end designers how documents should be presented, within our context with a Map/Reduce solution, the so-called "general information" would be the Map/Reduce function written and stored in the server. In fact, the goal of the function is to filter links and to create a "filters" field in order to give an hint in the response; in orther words returning the general information. Moreover, this exact same Map/Reduce function is also what Nelson called "typical command" such as FINDLINKSFROMTOTHREE.

Following your example. Let's say we have a document A which is about courses and have some comment links. We have two Xanadu servers called "elearning" and "etu". The location of document A is not important, each server could have a virtual copy of document A. Now, each server aim to show this same document in different ways, the first one avoid showing the comments link and restrict any possibility to comment the document, and the second show everything about document A. The solution would be to have in each server a Map/Reduce function which is slightly different, the first one aim to filter comment links, the second one do not. Then, it would be the front-end designer task to present the document. What Nelson called "general information" is now typically the server response saying that there is a filter, or some, or not. Given this information, he is now able to present the document as expected.