fedwiki / wiki-client

Federated wiki client-side javascript as a npm module.
Other
115 stars 36 forks source link

Questions regarding REST API #96

Open egonelbre opened 9 years ago

egonelbre commented 9 years ago

I started writing a (Federated Wiki Server in Go)[https://github.com/egonelbre/wiki-go-server] and had some questions about the REST API design.

  1. Wouldn't using Accept header for determining the response content type be better? This would make all the URL-s nicer by not requiring the .html or .json suffixes.
  2. Wouldn't using PATCH method instead of PUT for modifications be clearer? From REST standpoint PATCH should be more align with the editing actions.
  3. Wouldn't encoding the page action in the PUT/PATCH request be better? This would allow getting rid of the view, edit prefixes.
WardCunningham commented 9 years ago

These might be good suggestions. Any changes at this level would need to be coordinated with existing servers, devising upward compatibility mechanisms, and then implementing this in the server facing logic of the client. We've seen much more radical changes than any of these so it is possible.

Our own work with go set out to make a single module that could provide a service port of sorts to mission critical systems. Any change that would make the go module simpler and more trustworthy would be in line with this objective.

We have also built a federated wiki server in c++ for the arduino. This would be the scale of effort I would expect. Of course the value of such a server comes from the system to which it is attached.

egonelbre commented 9 years ago

Currently I'm writing the Go server as a replacement for the node server and to use as a storage backend. By implementing page.Store interface you have a fully functional FedWiki server with sitemap, static html pages, action handling, etc. This should allow easy enough switching of the actual content being served.

Is there any example of what the "service port" should serve; or is there an example of it. I should be easily incorporate that work into the current design, or create an example setup for a custom server.

Regarding those changes, I think it might make sense to wait until I complete the current work with Go server, I might find some other things to improve upon and change. It's probably easier to change multiple things at once rather than several upgrades.

WardCunningham commented 9 years ago

Sure, here is a version in Java. http://c2.com/doc/TestPoint/ http://c2.com/doc/TestPoint/TestPointInPractice.pdf

An important idea from federation is that the processing and formatting can come from another wiki and selection and storage of interesting cases could be forked from the service being monitored.

WardCunningham commented 9 years ago

Go's simple, modular and efficient routing seems something to be admired. My goal was to have a single read-only module that could be dropped into this path in a production server to report data from other data structures, not static files. A second edit handling module would make this read/write if that were appropriate.

Apropos your questions, much of the protocol of federated wiki stems from the first version implemented in Ruby/Sinatra where routing is done conveniently with regular expressions. We may have relied to heavily on them as evidenced by Go which prefers whole-token dispatch. There is also legacy in the protocols from when we thought of federation being a server-to-server cooperation. This didn't play well with NAT and is slowly being worked out of our protocols.

We have been asked on occasion why we aren't more RESTful but this has yet to lead to us actually becoming more RESTful. We're further down the road now but not so far that we can't embrace opportunities that we missed the first time around.

opn commented 9 years ago

@I'm starting work on a mobile client and I ws wandering if you could share any notes on the REST interface as is. I'd like to test out the interfaces of both servers to start making progress towards soem documentation?

opn commented 9 years ago

Right from digging around it seems that I should be able to generate and secure a robust API with authentication using Dreamfactory https://www.dreamfactory.com/features and here https://github.com/dreamfactorysoftware/dsp-core/wiki

One way seems to be to let this service provide access to the file store directly - https://github.com/dreamfactorysoftware/dsp-core/wiki/File-Storage-Services - that would I guess only provide CRUD access to the raw JSON. I think this can be easily extended by Javascript scripting on the server. I also think all we need really to make a good start is CRUD access to entire JSON stories? Does this look promising?

A question here is if we want compatible RESTful api's for the Go and the Node servers - then how would this work with an auto-generated API. I've asked the guys at Dreamfactory... one way I was thinking is if both implementation use the same data storage

Now I installed the node wiki server using this script - https://gist.github.com/nrn/f818fa7decfd910362b7 and there is some documentation on npm and wiki options here - https://www.npmjs.com/package/wiki

egonelbre commented 9 years ago

Currently my REST design is:

Slugs are in the form of:
    [a-z0-9\-_\?\&\=\%\#\";\.\/]+
    This way the server can respond to regular queries with pages e.g.
    /search#tags?q="blah"

    Also this allows nested pages:
    /category/page/something

All requests must contain "Accept" header
If the request has data then the Content-Type should be provided

In JavaScript this means:
    xhr.setRequestHeader('Accept', 'application/json');
    xhr.setRequestHeader('Content-Type', 'application/json');

GET /hello/world
    returns a page with slug "/hello/world"

PUT /hello/world
    with data:
    {
        "slug": "/hello/world"
        "title": "World",
        "synopsis": "",
        "version": 0,
        "story": [],
        "journal": []
    }
    creates/overwrites a page with slug "/hello/world"
    the slug basename must be based on the title by normalizing it

DELETE /hello/world
    with data:
    {
        "slug": "/hello/world"
        "title": "World",
        "version": 141
    }

    deletes page with slug "/hello/world"
    slug is simply for cross-validation with path
    this is a protect against modifying a page and someone else deleting the page

    if slug and path do not match then the action should be rejected
    if version in the request and server then the action should be rejected and the latest version of the page returned

PATCH /hello/world
    with data:
    {
        "slug": "/hello/world"
        "version": 141
        "action": ...
    }

    updates the page with action...
    slug is simply for cross-validation with path
    version is for checking concurrent updates

    if slug and path do not match then the action should be rejected
    if version in the request and server don't match then then the server should return the latest version available

    also, if versions mismatch then the server can try to merge the patch or reject the patch

I'm not totally convinced about the slug part, but currently it's useful for automatically generated services - e.g. a directory where each folder is a separate page.

egonelbre commented 9 years ago

One of my concerns is always returning the "journal"... over-time it will grow, grow and grow some more, so does it make sense to always have it in the response? And only provide it if it's explicitly queried for?

The same goes for the sitemap - if a site is automatically generated, how should it provide the sitemap, when it's over 10MB? Or should it simply provide some other means of discovering the pages?

I'm also currently working on open-sourcing a different client that is highly influenced by federated-wiki, and most of it works similarly, but has some extra data in the pages e.g. "tags", "comments"... although I would like to make it totally federated-wiki compatible, I'm just not quite sure how to exactly do that. I guess we can discuss it more properly when I have cleaned up the code-base and uploaded it.

It might be helpful for to wait until I release it, because I expect that it should be easier to port to mobile devices than the current wiki-client.

egonelbre commented 9 years ago

PS. all error codes are based on https://github.com/for-GET/http-decision-diagram.

One other thing I'm thinking is having the page always in a specific location:

/view/<slug>
/folder/<slug>
/system/<slug>

So the first part would become some sort of particular "service" that provides pages and the rest is left for the pages. It would have a nice symmetry regarding the layout of things... and allows easy routing based on the name.

But it's less beautiful. I guess saying that paths "/system/..." is dedicated and is reserved for fedwiki... Also some sort of "list of recommended paths for generated pages" might be helpful.

paul90 commented 9 years ago

One of my concerns is always returning the "journal"... over-time it will grow, grow and grow some more, so does it make sense to always have it in the response? And only provide it if it's explicitly queried for?

There has been some talk elsewhere about condensing the journal, see WardCunningham/Smallest-Federated-Wiki#422. But, as it provides the mechanism for attribution there is an obligation to provide. With paragraph dragging, there is a need to be able to extract the entries that apply to an individual paragraph to be able to provide attribution. This will probably be the limiting factor for any journal condensing.

The same goes for the sitemap - if a site is automatically generated, how should it provide the sitemap, when it's over 10MB? Or should it simply provide some other means of discovering the pages?

One way of keeping the sitemap small would be to remove the synopsis, and provide another search mechanism. Of course, client provided for such a site would still have to be able to make use the synopsis for search.

One other thing I'm thinking is having the page always in a specific location:

The current page namespace is flat - just /<slug>.json - the use of view in the lineup serialization, in the address bar, is just to indicate that the page is from the origin server rather than some other server. It does not form part of any request to get a wiki page.

There is an old issue about improving the story serialization, see WardCunningham/Smallest-Federated-Wiki#412 This is really a long overdue change.

WardCunningham commented 9 years ago

I do see some uniformity in the design as suggested but I don't see it solving any problems we currently have with the possible exception of offering delete which we have postponed.

I understand the argument that says json is just a representation of a resource, not the resource, but when you go down that road too far you end up with the tangle that is xml.

http://c2.com/ward/ascent.html

I'm also fond of the simplicity of fetching pages with curl, something you give up too.

curl http://fed.wiki.org/welcome-visitors.json

egonelbre commented 9 years ago

Yes, I was also thinking about the curl case and looking that json page directly from the browser. I guess I was prematurely fixing cases such as a page titled page.json, of course this can be avoided by not using . in the slug; so the question is do we want slugs containing ..

I initially implemented fedwiki mostly by studying the videos and the json format of the pages. That was the design I came up with - I really didn't look into client/server implementations too much.

Also the concurrency problems happen when multiple people decide to maintain a single wiki, e.g. a company federated wiki.

Also I had some questions about, that probably are outside of the scope of REST, but in the general design of things. What would is the best way to allow certain people commenting on some of your pages, the best I can come up with is having a separate service and client panel that contains comments. And how should other meta information be transferred, such as tags, rating etc. One would be to have them inside the page structure and let other clients ignore those. Or is having a separate service for those also better?

egonelbre commented 9 years ago

Regarding the large journal.

What if we had a separate section for the attribution in the page, e.g.

{
    story: [{
        type: "paragraph",
        text: "hello world",
        attribute: [0, 2]
    }, ...]
    attribute: [
        {origin: "http://ward.fed.wiki.org/diversity-in-service"},
        {origin: "http://fed.wiki.org/welcome-page"},
        {origin: "http://example.org/page"}
    ],
    recent-journal: [
        // only provide N recent modifications
    ]
}

and you would have to use something like
GET "/page.journal.json?limit=100"
GET "/page.journal.json"
alternatively you need to specify whether to include journal or not...
GET "/page.json?journal=100"
to get more information

That way the attributions are always available, you can get the recent changes - because that's probably what you want... and there's a way to get the full log as well.

Regarding paragraph attribution, I currently have:

{
    type: "paragraph",
    id: "123"
    origin: "http://fed.wiki.org/welcome-page",
    originId: "xyz",
    text: "hello"
}
WardCunningham commented 9 years ago

There are operations that work well with the complete journal such as shift-hover and drag-to-merge. There is also a design philosophy here of creating a few powerful mechanisms and then discovering how they can be applied to existing or new activities. It is not the kind of problem solving one get to do when doing work for hire. See http://h2.ward.asia.wiki.org/simple-rules.html

paul90 commented 9 years ago

It might, or might not, help to think about the page file in a completely different way.

The journal is, for all intents, a sparse merged event stream - merging the events for the page itself and its constituent parts into a single stream. Sparse because there are certain attributes that are only knowable by inference from the journal as a whole. It is also incomplete, to some extent, as dragging content between pages does not propagate the event stream for that item.

The story is then simply a cache of the current state - if it was missing it could be recreated from the event stream contained in the journal.

egonelbre commented 9 years ago

I was watching http://www.infoq.com/presentations/rest-misconceptions and had several misconceptions about REST. From there, the idea of including how the page can be modified inside the page itself is quite interesting. Just an illustrative example:

{
    "story": [
        {"id":"1", "type":"paragraph", "text":"hello world"}
    ],
    "links": {
        "edit":   { "method": "POST", "url": "/view/page-hello/edit"},
        "delete": { "method": "DELETE", "url": "/view/page-hello"}
    }
}

Also this would allow backwards compatibility - when a page does not define it - everything works as the old format.

Note, that this is just a discussion point, not an actual proposal.

WardCunningham commented 9 years ago

I haven't watched the infoq presentation but I have read Roy's thesis where he develops REST from first principles. https://www.ics.uci.edu/~fielding/pubs/dissertation/top.htm

I have seen the embedded description thing in the github api. They chop results into pages and then send you the request you need to get the next page. Each query counts against your traffic allocation.

The example you show look like little opcodes in a programming language for which I could write an interpreter in the core javascript in order to formulate a request to the server. This might be overkill since the server already controls the javascript that will be sending those requests.

I would like to modularize the client's pageHandler into three layers so that it is easier to adapt it to emerging protocols beyond http. If one wanted to employ the protocol you describe, it would be another module in layer three. http://ward.fed.wiki.org/three-layer-storage.html

egonelbre commented 9 years ago

I have seen the embedded description thing in the github api. They chop results into pages and then send you the request you need to get the next page. Each query counts against your traffic allocation.

I mean we will be doing those requests anyway.

I would like to modularize the client's pageHandler into three layers so that it is easier to adapt it to emerging protocols beyond http.

That sounds like a better solution.

opn commented 9 years ago

Does this look useful in this context?

I saw a presentation and had a good chat with some of the developers, and I feel there may be interest there?

On Wednesday, April 15, 2015, Egon Elbre notifications@github.com wrote:

I would like to modularize the client's pageHandler into three layers so that it is easier to adapt it to emerging protocols beyond http.

That sounds like a better solution.

— Reply to this email directly or view it on GitHub https://github.com/fedwiki/wiki-client/issues/96#issuecomment-93219235.

opn commented 9 years ago

Using remotestorage. would provide a standards based, user controlled minimal server. The remotestorage server would simply serve the JSON in a REST friendly read write way. All the authentication would be done for us, and we get for free an existing network of servers with the user being able to configure where they want their stuff kept.

The built in local storage and interest in developing / working with alternative p2p back ends would mean that when these guys add a new interface to a backend everyone benefits. This way we could move to as much being done on the client as possible.

More full fledged servers could provide HTML rendering and sophisticated stuff like data analysis and views across domains etc. these servers would access the underlying data in the same way the client do - using remote storage?

paul90 commented 9 years ago

I was watching http://www.infoq.com/presentations/rest-misconceptions and had several misconceptions about REST.

I believe that the Jon Moore talk that was referenced is Building Hypermedia APIs with HTML.

WardCunningham commented 9 years ago

The first comment says, "A good REST API is like an ugly website."

From this I gather that the talk suggests a restricted subset of http/html as the wire protocol between client and server. Am I right? This would make the unassisted browser a useful protocol analyzer.

From another conversation ...

Paul has suggested that markup should be a user preference, like timezone, and that the computer could translate freely between them. A carefully chosen subset of html might serve as the universal representation that all markups convert through.

We've embraced JSON as our data friendly serialization but we left the door open to embed other representations within it. We could, for example, create a Universal Plugin that stored text in a universal markup and then presented it to each editor in the user's preferred form.

paul90 commented 9 years ago

Does this look useful in this context? - www.remotestorage.io

Looks like a solution looking for a problem. At best it looks to add needless complexity, and raise a high barrier for adoption.

There are a number of similar storage solutions, none of which appear to have any traction

It should also be noted that on there forums they direct potential users to a particular implementation that comes with the following health warning.

As with any alpha-stage storage technology, you MUST expect that it will eat your data and take precautions against this

This though is a different discussion than that started by Egon.

If you are developing a different client, there is nothing to prevent you having a client/server that uses a completely different API for editing than that used in the current server - as long as the read API is supported, so it can still be part of the federation.

egonelbre commented 9 years ago

One of the main problems I'm hoping to avoid is to execute arbitrary plugins/modules downloaded from third-party sites. From a security stand-point that can be a big risk and hard to protect against properly.

This means some-what standardizing how to query information from another site.

The example you show look like little opcodes in a programming language for which I could write an interpreter in the core javascript in order to formulate a request to the server

Why would you need an interpreter? Since the links are embedded inside the page there's nothing to interpret.

var page = {
    "links": {
        "edit":   { "method": "POST", "url": "/view/page-hello/edit"},
        "delete": { "method": "DELETE", "url": "/system/delete/page-hello"}
    }
}

function Edit(page, op){
    request("edit", page, op, function(){
        console.log("done");
    }, function(){
        console.log("error");
    });
}

function Delete(page){
    request("delete", page, null, function(){
        console.log("done");
    }, function(){
        console.log("error");
    });
}

function request(name, page, data, loaded, errored){
    var params = page.links[name];
    if(typeof params === "undefined"){
        throw new Error("Operation " + name + " is not allowed.");
    }

    var xhr = new XMLHttpRequest();
    xhr.addEventListener('load', loaded);
    xhr.addEventListener('error', errored);

    xhr.open(params.method, params.url);

    xhr.setRequestHeader('Accept', 'application/json');
    xhr.setRequestHeader('Content-Type', 'application/json');

    xhr.send(JSON.stringify(data));
}
WardCunningham commented 8 years ago

@egonelbre Some time has passed since we last talked. Can you tell us more about experience with your implementation?

egonelbre commented 8 years ago

Let's start with the issues:

The main thing I see people struggle with is the UI, it's not that they can't eventually get it, but rather it's difficult to work with. From "The Humane Interface" would pretty much predict this. I've been pondering whether http://prosemirror.net/ could be used to fix it - instead you still edit each item simultaneously, but the "modalness" of page drops. Of course since people can eventually learn it, I haven't had a good reason to spend that much time on it.

People prefer it over the standard help (navigation tree on the left, on the right a single page). Navigating lots of information is much nicer with the side-by-side pages. However, very few people actually update and add content; not quite sure why, maybe it's a training issue (they don't know well enough the system, so they don't want to try) or something else is out of place,

One very nice usage was to track a progress of a project, backlogs and backlog items. Generally issue trackers are "per person", which means in a group it's easy to get "separated" instead of continuously communicating what we shall do next and how to best do it. Having it in a single group wiki avoided all of that and the development process became much smoother.

Core of the system has been working pretty well (aside from some cross-browser-compatibility issues).