alaric-dotmesh commented 6 years ago

The dotscience frontend would like to be notified asynchronously when there's a new commit to a workspace or data dot, so that the state can be updated.

We currently poll for this kind of thing, which is really inefficient.

Therefore, we could reduce server load significantly by having a way for Dotmesh RPC clients to subscribe to notifications (through websockets?) whenever commits happen to subscribed dots.

We could then use that in the frontend to refresh the pages that currently poll.

alaric-dotmesh commented 5 years ago

Here's my proposal for an architecture, please comment:

We add two extra RPCs DotmeshRPC.SubscribeForCommits(url, username, password, subject) and DotmeshRPC.UnsubscribeForCommits(url, username, subject). They subscribe and unsubscribe, respectively, a given NATS connection for Dotmesh commits. Only the admin user may invoke them.
Inside Dotmesh, we maintain a map from url/username pairs to a struct containing a hash of the password, a NATS connection object, and a reference count.
We also maintain a list of commit subscriptions; each element is a url/username pair and a subject string.
On subscription, we first check to see if the url/username pair already points to a connection with the correct password, and increment the refcount if so; if not, we create one with a refcount of one. We then insert an entry into the commit subscriptions list.
On unsubscription, we remove a commit subscription from the list, decrement the refcount of the connection, and close the connection if it has no surviving references.
On every commit (spotted by updateSnapshotsFromKnownState), we publish a NATS message to every subject in the commit subscriptions list.

This approach:

Paves the way for other async notification types in future, not just commits (new dot/branches, for instance), letting them share the connections map while having their own subscription lists.
Gives a clean boundary between dotmesh and dotscience: the gateway can pass in its own NATS details to dotmesh, so dotmesh doesn't need to "know" about other parts of the system.
Doesn't leak NATS connections to other users of the Dotmesh API; if we re-use an existing connection, we still need to provide the correct password. This API call is only available to the admin user, so the risk window is small, but we close it anyway.
Doesn't leak all the NATS passwords if dotmesh's memory is stolen, as we just store a hash to check other passwords against. I considered salting the hash to make password guessing from stolen hashes harder, but I don't think it would be worth it as the number of passwords obtained this way would always be small (usually only one), and eminently brute-forcable from scratch. But it's worth using a proper password hash, as we already do for dotmesh authentication.

Godley commented 5 years ago

If a user tries to use those endpoints without having NATS running, what happens? Since this is in dotmesh I'm thinking of the implications of someone trying to use this locally. I can't see why they would unless they were building their own ui on top, but that's always a possibility

alaric-dotmesh commented 5 years ago

In light of our decision to focus Dotmesh on being a component for Dotscience rather than continuing to develop it as a general tool, paving the way for other async notification types in future and worrying about different Dotmesh users having different subscriptions becomes a moot point...

@rusenask suggested just configuring Dotmesh with env variables at startup pointing to a NATS server/subject to send all commit notifications to.

Now, I've already implemented the above proposal so I'll finish writing tests for it so that https://github.com/dotmesh-io/frontend-ng/issues/187 can proceed on that basis, but when we come to scaling Dotscience to multiple nodes, it'll be a lot easier to do that with Karolis' simpler mechanism - so let's replace mine with that when we get to that point!

(For multiple nodes, we need to distribute the NATS connection details to every node - how to do that without needing to put NATS passwords in etcd under the generic thing is an issue I was still mulling - and make the master node for each fsmachine responsible for publishing commit notifications for that filesystem so we don't get duplicates)

alaric-dotmesh commented 5 years ago

Ok, a single-node-safe implementation of this is in place. See https://github.com/dotmesh-io/dotmesh/commit/10bbd10489657e2f9bc5b04e29adca698112762c#diff-d96a15e06e1ae7e43cb2b1d48e38c457R222 for a sample of how to use it.

alaric-dotmesh commented 5 years ago

587 documents the plan to extend this to multi-node clusters.

dotmesh-io / dotmesh

Dotmesh server providing notifications of new commits #574

587 documents the plan to extend this to multi-node clusters.