need simple implementation of OPSN node

oresmus commented 7 years ago

Anyone needs to be able to run an "OPSN node" (see issue #2) on a free or cheap online server.

oresmus commented 7 years ago

One possibility (partial solution only) is for an OPSN node to be packaged as a Docker image. Then it could be run on any desktop machine (perhaps in some smaller machines too?) or in many clouds.

Either way it could have a url from which it could be accessed using either an HTTP API or as a web app. (Maybe a TCP API is also possible, though it may be less useful if proxies or firewalls have to be involved.)

(This still leaves open exactly what is running inside that Docker image.)

oresmus commented 7 years ago

Another possibility is to run an OPSN node and/or an OPSN web app inside Sandstorm.io. (See a recent Google+ discussion [link needed xxx, initial topic was my post asking about ZeroMQ] for recommendations related to that, and other good things to read.)

Advantages (about using that hosting software, regardless of where it's hosted; and assuming I can trust what I read about it to be accurate):

Sandstorm framework will take care of everything about secure login and authentication
framework has its own system to let users share documents (in this case OPSN pools, or perhaps OPSN mutable items)
already has much of what is needed to support federation of servers, and to let users run their own hosts, or for more than one company to offer the hosting service

oresmus commented 7 years ago

Minimal features needed in an OPSN node:

I'll assume that a node can maintain one or more "pools" as described in my blog posts. Each pool has a single owner who controls all changes to it. It has lots of immutable items, which are chunks of data, some binary, mostly small. Some of them can be indexed by their hash value. Others belong to named sets, which might as well be ordered, i.e. "message queues". There might be a few other mutable variables too. And there might be some queues designated as writable by non-owners.

The API operations by the pool owner are everything you'd expect for modifying that data. And by the node owner, creation/admin/deletion of pools.

The API operations by others: just reading pool contents (if they have read access to it), or adding messages to any message queues they have write access to, or (ideally) being able to subscribe to notifications of changes in certain pools or queues. (We use queues to implement complex mutable items, so this allows notification of changes to those.)

Various subsets of these ops are possible, in the sense that the others could be implemented on top of them. But any implem proposal needs to make clear how to do everything listed above, I think.

Optional features (not sure whether they are worth it initially -- according to C4 process this would suggest not implementing them until a clear need comes up): HTTP access to pool contents. Ability for pool contents to represent any static web site in terms of that access.

Access (owner and user) must be possible both from web apps and desktop apps.

oresmus commented 7 years ago

One possible, but probably not ideal, implementation of an OPSN node, would just be a shared filesystem, but with only the owner allowed to write on it (except possibly for certain subdirectories). It would have to support certain operations atomically, to avoid corruption from server crashes and/or writes my multiple processes with the same owner.

By convention (optionally enforced by API), that owner would only modify its files in certain ways (eg if they represented message queues, only by atomically appending messages in a format in which their length is unambiguous; for files containing items indexed by their hash, only using the correct name derived from the hash, etc). How hard this is depends on a lot of details, but especially on what atomic operations are actually supported (and how much we trust that in the server).

This would be inefficient in several ways (especially related to lots of small files, or lots of small appends onto large files), but could be made to work, and would be useable for initial experiments (which would not in any way force us to keep using it indefinitely).

The reason it's worth mentioning is in case there is an existing service, or existing software (eg to run inside Sandstorm or a Docker node), which already does this. (It's a generic enough kind of service that there might be.)

Unfortunately I didn't yet find one that does this and can be accessed securely from inside a staticly hosted web app.

E.g. to use Dropbox this way, due to its limited API you have to replace an entire file whenever you modify it (unlike from a desktop app, which can just append bytes to a file and let Dropbox sync them efficiently), and the web app also has to contain an "api key" from Dropbox which any technically advanced user could steal, and (under plausible circumstances) use to mess up the data of other users.

So to support storing OPSN files on Dropbox (which might be useful, since it will give anyone 2GB of free online storage), you have to write to those files only from a server (whose code we have to write) or from a desktop app. But we need a web app, so we have no choice but to write a server for it, even if it's only essential reason for being is to securely hold a dropbox API key.

I didn't investigate Dropbox competitors (eg Google Drive, Box, etc) but I'm guessing they're similar.

Sandstorm.io works differently enough that it's conceivable it already has an app which stores files in a way that could be used for this. But I didn't study this in any detail.

The main requirement is a URL which gives anyone read-only access (either without login or at least without being the owner), but for only the owner to be able to write files. Then we could write a web app also running under Sandstorm, which could write files using the Sandstorm authorization API to connect it to the file-server app. This app we wrote would need whatever nice OPSN UI we want to give it, but would not need us to write authorization code (Sandstorm does that) or code to store or serve files (this hypothetical other app does that).

My guess at this point is that the minimal good-enough OPSN node (or pool) API is simple enough that we can more easily implement it in new code (ie make our OPSN-UI web app store the files by connecting to our own OPSN pool app), than find suitable already-existing apps (since I imagine we'd have to study and reject many of them before finding one that worked). But in case someone reading this already knows about an existing app that would work, I mention that here.

(I said "pool" rather than "node" in the prior paragraph, under the assumption of using Sandstorm, since it lets us write our app to work with a single "grain" (document, or more precisely, the unit of access control), so we'd naturally write it to implement one OPSN pool. If we don't use Sandstorm, then we'd want an OPSN node server which ran any number of pools.)

oresmus commented 7 years ago

Considering everything above, and initial research into Sandstorm I did since then, my current belief is that we should find the simplest working Sandstorm app we can (which has clean code we can easily understand and modify, under a suitable open-source license) -- or we should make one, if we can't find a simple enough existing example -- and use it as a base for our own "OPSN node" sandstorm app.

For the example code we start with (or make), I think the only requirements are:

use sandstorm, including its model of user authentication and API tokens;
use some kind of persistent database (probably could just use the underlying Linux filesystem, or could use some simple DB running on top of that);
provide an HTTP API which we can easily extend to more operations involving that DB;
- desirable: must show how to return or accept a "large binary blob" as a parameter in that API
- desirable: show how some ops can be HTTP GET (permitting client or proxy caching optimization) or HTTP PUT (permitting other optimizations), though some must be HTTP POST
- optional: demonstrate how to do atomic writes to multiple objects in that DB
ideal: have small, clean, well-documented code
ideal: be in a language we already both know (like Python), or have such clean code we don't mind it being in a new language

oresmus commented 7 years ago

To help with this, sandstorm.io has lots of documentation, and many existing apps which might be good enough examples. But I have only just started looking into them to try to pick one (and I haven't yet asked anyone in its community for advice).

It also has a "packaging guide" for existing web apps, so another option is just to start with some trivial web server (e.g. in 10 lines of python) and package it. The downside of that is that we'd have to add the HTTP API, and I didn't find their guide about that immediately understandable (I'm guessing it assumes more web developer knowledge than I have).

oresmus commented 7 years ago

It's possible this is the sandstorm web app with the smallest code (also fast startup and small package, they say): https://github.com/sandstorm-io/sandstorm-rawapi-example

(It is small and fast since it doesn't use an http server and their http bridge, but a more direct interface to the sandstorm framework. Skimming the code, the C++ server appears to allow PUT and GET of individual files under /var, but the HTML/JS client only offers textediting of one file, /var/content. I haven't tried running it; I'm not sure if they provide a demo. As a base for our own app, it doesn't cover offering an HTTP API, but it does demo the most basic other things listed above.)

oresmus commented 7 years ago

I drafted a design doc for a very simple OPSN storage server:

https://github.com/OPSN/MVP-discuss/blob/master/storage-server.md

Followup questions for @jmichelz:

Do you understand my wording?
Do my ideas make sense to you?

If we settle on this, then I think we can implement it pretty quickly (in ways I'll document in this issue, if I didn't already) (we would implement it in a new repo).

Then we could start hacking (in yet other new repos) on client programs that can connect to it (which could equally easily be desktop apps or web apps, in any convenient language). They'd cover either or both of

user UI, for browsing the graph and editing it in various ways
maintenance parts of a server, such as compacting files, removing old files
looking on the web (or local disk) for external data to put into new files (could be invoked by user UI or run separately)

Those programs would need to use libraries which understand/implement the format conventions I described in the document above. (My two long blog posts also give essential info about assumed format.)

jmichelz commented 7 years ago

I think the way I would approach it would be: 1) find some idea users would be excited about 2) mockup a ui for it and show it to them 3) then start to put in a server that supports the client

For the server itself, making it filesystem based makes it simple but at the expense of not being able to do much. Maybe it should start as a version control system? I'm guessing merge issues will be fairly central to the system.

oresmus commented 7 years ago

I'm not disagreeing with any of that (as a useful part of the project, to do near the start). But I'm trying to "factor" it in this sense: split (3) into

the "pure storage server" (the simple filesystem like I described)
the "library for doing stuff to that storage" (usable both by client apps, and by more advanced "server-like code" which might run on client systems or even as part of client apps).

My reasons are (some repeated from above, not all):

not wanting the server code to keep changing, as we constantly improve (or just outright change) the library for using the storage
getting the proof of concept of the central OPSN idea "users routinely make copies of other users' data" working as soon as possible, and included as fundamentally as possible (note that doing this to "data living inside a git repo" would be highly nontrivial)
I think this is the right factoring even in the long run, for reasons described in my blog posts, including that while merging (and "version control") is central, there is not only one way to do it, so how to do it should not be baked into the protocol, rather each client should support certain ways.

I agree that merging is important and that this does need to be thought through, probably a bit more than I actually have.

Note there is a sandstorm example that wraps a git repo and exposes it over an API, so it's possible we'd decide to extend that one instead... but first I'd want to clearly understand how we planned on using it and why that made sense.

Building in a git repo (for example) feels like a violation of two important goals: making the server extremely simple, and making it agnostic to specific choices of collaboration/merging algorithms.

As an alternative to that:

Nothing prevents people from making OPSN posts which point to (link to) existing git repos or other places they're going to collaborate (google docs, realtime tools of various kinds, etc).
Nothing prevents an OPSN client from sometimes copying and archiving data from those places (provided it reads a license file that permits this, or some responsible party tells it to do that on that party's behalf), or from sometimes embedding those places in its UI.
The initial OPSN use cases I imagine mostly involve pointing to existing data and only editing very small pieces of text which are "native to OPSN", and only a few times (so storing every version outright is not prohibitive, and many users would only store the latest version anyway); and the merging algorithms I know about, for these initial use cases, don't necessarily fit into something like git (as a subset of what it does now) anyway.

If we try out those initial use cases with a naive implementation which stores every version whole and might discard old ones, and find a problem with mutable text being too verbose or too "uncontrolled", at that time we'd have more data to use in deciding what to do about that problem.

I think we agree in general with the principle "start as simple as possible, then only add complexity if it solves a clear problem, and only after that problem comes up in practice", and this problem of "needing version control for text in OPSN" hasn't yet come up and (I think) might not come up in a way which a built-in git repo server would solve; this makes me interpret that principle as saying "it would be too early to decide to build in a git repo into the server".

oresmus commented 7 years ago

About the other actions in your list, I think (1) and then (2) can be usefully done in parallel with making the storage server, and I think we've already started working on (1) in issue #4.

For (2) I propose doing the UI mockup by actually writing code in Elm language which generates the UI from stub data. (I'll say more about that if you want. Last night I read a blog post in which someone did exactly that with good success. I can probably find it again if that would be useful.)

OPSN / MVP-discuss

need simple implementation of OPSN node #3