fedwiki / wiki-plugin-transport

Federated Wiki Plugin - Transport
3 stars 4 forks source link

Validate and correct json from remote sites. #1

Closed WardCunningham closed 8 years ago

WardCunningham commented 9 years ago

Wiki tries to be accommodating of whatever json it finds as it browses the federation. In this spirit it makes {} a valid page by providing missing fields. But this may not be helpful for someone writing an external service that is attempting to provide good content. We should write and maintain a wiki json validator similar to html tidy. This could be incorporated into the Transporter (or anywhere json is fetched) as follows.

I will suggest checks to make and sometimes accommodations that might be appropriate. We welcome suggestions in comments to this issue.

Checks

Is the text valid json? It might be sufficient to catch a failed JSON.parse() but a more tolerant parser might return the successfully parsed prefix of the text.

Is the parsed json an object? An array could be treated as a story, a string as a paragraph, numbers, true, false and nil could be treated as strings.

Does the object have a title, story and journal? If not, create empty versions of each.

Do the items of the story have types that name plugins available on this server? The call to the validator could provide the list from /system/plugins.json. If not, passing the unexpected type is ok because this is handled by plugin lookup.

Do the items of the story have unique ids? If not unique, generate aliases. If not present, generate ids at random.

Do items have text that makes sense for the item type?

Do items have text of reasonable length? If not, consider truncating it.

Do items have text with square brackets that balance as links?

Do html items have tags that won't sanitize?

Do image items have pictures that are of reasonable size?

Do video items have captions that allow them to be edited?

Do items have text with British spellings of familiar words like color or neighbor? (just kidding)

Do the actions of journal have types that can be understood by lib/revision? Do they have expected parameters?

Do the actions have ids that match included items?

Do the actions have dates? If not, infer dates from surrounding actions or use the date that dating actions began.

Do add actions that are positioned after items in the page that can be found?

Do the actions recreate the story?

opn commented 8 years ago

Currently the plugin uses client side REST calls to the external web service. This puts a requirement on that external service to provide the necessary CORS Headers to the result they return, which is not a standard feature of REST services, and places an additional demand on the coder of the external service - http://future.fedwiki.org/transport-plugin.html

Given our requirement to make the creation of services to the federation as easy as possible to anyone with basic coding experience and access to a public server, we ideally don;t want them to have to know about things like CORS Headers from the get-go.

Proposed Changes

Let's use the server side component of the plugin to make the REST call, and not the client. This would mean that the client receives it's data directly form the site of origin, and no CORS Headers are required by the provider of the external Web Service.

WardCunningham commented 8 years ago

The problem with involving the server is that it may be on the wrong side of a firewall to reach the desired service. It would be possible to write a supplemental proxy in a language with good support for CORS headers and let it contact the constrained service. There doesn't seem to be any additional capability by proxying through the wiki-server.

opn commented 8 years ago

I probably don't have the same grasp as you regarding firewall issues -but isn't it the case that the client is almost always behind a firewall - or does that not affect things?

What is the case is that for most cases the server is a public server on (for now DigitalOcean) - even local servers will most often be calling local Restfull services I suspect.

Why not have a flag "proxy" which if set to "false" would use the current behaviour, and the default being to use your public server of origin as a proxy for the REST calls.

We don't want to make it hard for basic coders to contribute code using their Apache and php cgi setups

WardCunningham commented 8 years ago

As currently configured, I can run a transporter on my laptop and employ it while authoring content on a wiki in the public internet on the far side of my home/office firewall. The wiki server wouldn't have access to my laptop, but the client can access the transporter so long as the transporter offers CORS headers.

I've written many cgi scripts in perl. They all begin by printing headers.

print "Content-type: text/plain\n\n";

To enable CORS I would extend this with the extra header. (but see below)

print "Access-Control-Allow-Origin: *\n";
print "Content-type: text/plain\n\n";

I might also want to be more specific about the content type but that is another issue.

Search quickly finds similar advice for many kinds of servers. http://enable-cors.org/server.html

WardCunningham commented 8 years ago

I have set out to test the advice I have provided in the previous comment and to document what I learn.

The CORS header is tested by the browser with a "preflight" request. The server must understand that as distinct from a GET or PUT. The cgi approach I recommended results in the browser complaint:

Request header field Content-Type is not allowed by Access-Control-Allow-Headers in preflight response.

I will explore the advice given by enable-cors.org and resume my documentation then.

makevoid commented 8 years ago

I think you need to pass this additional header:

Access-Control-Allow-Headers: Content-Type

to whitelist Content-Type as a header that can be added to the CORS request

If you wish to pass more headers you can separate them by commas:

Access-Control-Allow-Headers: Content-Type, Accept-Encoding
opn commented 8 years ago

I agree that it is not that hard to provide CORS headers with a CGI script - but it is hard enough to stump a few folk who have not had the need before.

The use case you indicate is interesting but is pretty marginal, and you would get most of the same functionality running a node wiki server behind the firewall / on your laptop then forking the page?

Having a switch though giving you the option to use browser based REST calls or procuring these on the server would be ideal.

WardCunningham commented 8 years ago

@makevoid This worked. Thanks.

WardCunningham commented 8 years ago

This issue suggested extensive validation of json returned from suspicious transporters. With no action for 11 months I judge this not likely to happen. Perhaps this means our transporters are good enough.

Further discussion revolved around the burden of CORS. I consider this too a fact of life that is not likely to change. For these reasons I will close this issue with no further action.