Closed cinnamon-bun closed 4 years ago
Oh right, I'll update the README to say that syncing is triggered through the GraphQL API itself. Once you know that, there are docs in the playground for the sync
with all the details.
As for earthstar-graphql instances being able to sync with each other... I've not been sure about what to do here! If that were the case, an instance would be a complete pub with a GraphQL endpoint, right?
I'm a little hesitant about positioning this as something that should be deployed online and publicly accessible (I imagined it being embedded / running locally as the first choice), because it would mean I'd have to focus on hardening the API against certain attacks (e.g. it'd be really easy to make a malicious query complex enough to take the instance down as things stand).
Maybe because of the focus on pubs being personal and non-discoverable, it wouldn't be too much work to make this 'good enough', though?
Hmm, yeah, do you mean GraphQL queries can be complex or earthstar queries can be complex? A worst-case earthstar query might have to scan every document but it shouldn't be O(n^2) or anything. But maybe it's hard to make GraphQL safe against DDoS attacks?
Earthstar peers can be servers or clients or both. Servers respond to queries, clients request queries.
Syncing is a conversation between a client (who drives the conversation and tries to keep the sync efficient) and a server (who just answers questions). This is asymmetrical to make it fit better in a HTTP paradigm (vs. a duplex stream paradigm like SSB and hypercore).
Servers
Clients
For two servers to sync with each other, one of them has to act as a client -- it needs some extra code to drive the sync conversation. E.g. you'd somehow ask graphQL server A to start a long-running background process that talks as a client to graphQL server B.
Syncing is really basic right now. This happens in earthstar's sync.ts
// client pull
Client: GET all your documents
Server: [doc1, doc2, doc3]
// client push
Client: POST hey, here's all my documents: [doc1, doc2, doc3, doc4]
Server: ok
This also adds the concept of "replication queries", where each side can express what data it wants to have. Maybe a peer only wants wiki documents, or recent documents.
// client pull
Client: GET hashes of all your documents that match my replication query `{pathPrefix: "/chess"}`
Server: [hash1, hash2, hash3]
Client: GET I don't have [hash2, hash3] yet, give me those.
Server: [doc2, doc3]
// client push
Client: GET What do you want?
Server: My replication query is `{pathPrefix: "/wiki"}`
Client: POST I have [hash1, hash2, hash3], what do you need?
Server: I need [hash1, hash2]
Client: POST [doc1, doc2]
Server: ok thanks
@cinnamon-bun Regarding complex queries, you can write something like this:
{
workspaces {
documents {
authors {
workspaces {
documents {
# you get the idea
}
}
}
}
}
And unless you have some depth limiting or complexity analysis, the schema will dutifully resolve every item for each level of this query.
@cinnamon-bun It's true that earthstar-graphql lets you query for Earthstar data and returns a response, but I think because it doesn't do that within the context of a sync operation, this package could be considered a client with the Earthstar ecosystem?
Even though one of the main exports for this is a HTTP server, the intention for this package is that it's deployed locally and acts as a client's Earthstar 'engine': get me a list of my workspaces, the latest documents, set some data, kick off a sync, etc.
You could easily deploy the HTTP server online, but that seems to take away a lot of the benefits you get from earthstar: clients would need an internet connection to get data for their UIs, it's a single point of failure for many clients, and someone malicious could easily bring it down. And because this server only understands GraphQL queries, it wouldn't make a good pub as clients are expecting this conversation to happen in a certain way.
What do you think?
@sgwilym
Yeah, it's a lot of work to harden something for being exposed to the internet!
I think there's two ideas in play here:
Anyway đ on designing this project for localhost if that's what you want to do. It sounds like HTTP would be a better choice than GraphQL for internet-hardened "pubs".
@cinnamon-bun Well⊠I think Iâm changing my mind on this. đ My next plan for earthstar-graphql was to add filters to many of the fields, e.g.
{
workspace(address: â+gardening.123â) {
documents(author: âtoot.abc123â, pathPrefix: â/diariesâ, after: 138238742987) {
... on ES3Document {
# selectionsâŠ
}
}
}
}
While I originally planned this for client convenience, I now see that itâs a great fit for the kind of âefficient syncâ you describe above: a client can send a specific query to earthstar-GraphQL and get everything it needs to ingest a bunch of documents into a IStorage
.
I can now see a path to having something like a syncGraphQL(storage: IStorage, graphqlUrl: string)
export in this package.
(Iâve also been doing my homework on making the server better at handling deep queries, and feel better about this too... basically I spoke too soon!)
The docs don't describe how to sync, specify pub URLs, etc.
(Hm, will an earthstar-graphql instance be able to sync with another earthstar-graphql instance?)