bcspragu / logseq-sync

An open-source Logseq Sync backend implementation
MIT License
141 stars 4 forks source link
logseq

Logseq Sync

An attempt at an open-source version of the Logseq Sync service, intended for individual, self-hosted use.

It's vaguely functional (see What Works? below), but decidedly pre-alpha software. Definitely don't try to point a real, populated Logseq client at it, I have no idea what will happen.

What's Done/Exists?

Right now, the repo contains (in cmd/server) a mostly implemented version of the Logseq API, including credentialed blob uploads, signed blob downloads, a SQLite database for persistence, and most of the API surface at least somewhat implemented.

Currently, running any of this requires a modified version of the Logseq codebase (here), and the @logseq/rsapi package (here)

On that note, many thanks to the Logseq Team for open-sourcing rsapi recently, it made this project significantly easier to work with.

What Works?

With a modified Logseq, you can use the local server to

  1. Create a graph
  2. Upload (passphrase-encrypted) encryption keys
  3. Get temporary AWS credentials to upload your encrypted files to your private S3 bucket
  4. Upload your encrypted files

And that's basically the full end-to-end flow! The big remaining things are:

API Documentation

There's some documentation for the API in docs/API.md. This is the area I could benefit the most from having more information/help on, see Contributing below

Open Questions

S3 API

The real Logseq Sync API gets temp S3 credentials and uploads files direct to S3. I haven't looked closely enough to see if we can swap this out for something S3-compatible like s3proxy or MinIO, see #2 for a bit more discussion.

Currently, amazonaws.com is hardcoded in the client, so that'll be part of a larger discussion on how to make all of this configurable in the long run.

Associated Changes to Logseq

Being able to connect to a self-hosted sync server requires some changes to Logseq as well, namely to specify where your sync server can be accessed. Those changes are in a rough, non-functional state here: https://github.com/logseq/logseq/compare/master...bcspragu:logseq:brandon/settings-hack

Adding a database migration

The self-hosted sync backend has rudimentary support for persistence in a SQLite database. We use sqlc to do Go codegen for SQL queries, and Atlas to manage generating diffs.

The process for changing the database schema looks like:

  1. Update db/sqlite/schema.sql with your desired changes
  2. Run ./scripts/add_migration.sh <name of migration> to generate the relevant migration
  3. Run ./scripts/apply_migrations.sh to apply the migrations to your SQLite database

Why do it this way?

With this workflow, the db/sqlite/migrations/ directory is more or less unused by both sqlc and the actual server program. The reason it's structured this way is to keep a more reviewable audit log of the changes to a database, which a single schema.sql doesn't give you.

Contributing

If you're interested in contributing, thanks! I sincerely appreciate it. There's a few main avenues for contributions:

Getting official buy-in from Logseq

The main blocker right now is getting buy-in from the Logseq team, as I don't want to do the work to add self-hosting settings to the Logseq codebase if they won't be accepted upstream. I've raised the question on the Logseq forums, as well as in a GitHub Discussion on the Logseq repo, but have received no official response.

Understanding/documenting the API

One area where I would love help is specifying the official API more accurately. My API docs are based on a dataset of one, my own account. So there are areas that are underspecified, unknown, or where I just don't understand the flow. Any help there would be great!

Specifically, I'd like to understand:

  1. The details of the WebSocket protocol (doc started here), and
  2. How and when to update the transaction counter, tx in the API

Debugging S3 signature issues

I believe there's a bug (filed upstream, initially here) in the s3-presign crate used by Logseq's rsapi component, which handles the actual sync protocol bits (encryption, key generation, S3 upload, etc).

The bug causes flaky uploads with self-hosted, AWS-backed (i.e. S3 + STS) servers, but I haven't had the time to investigate the exact root cause. The source code for the s3-presign crate is available here, the GitHub repo itself doesn't appear to be public.