YousefED / SyncedStore

SyncedStore CRDT is an easy-to-use library for building live, collaborative applications that sync automatically.
https://syncedstore.org
MIT License
1.71k stars 51 forks source link

How to initialize value or use existing Yjs doc stored in server #29

Closed milesingrams closed 2 years ago

milesingrams commented 2 years ago

First off, this is an amazing project and I'm incredibly impressed with how wide a range of reactive bindings and library integrations you were able to provide. Thank you for your work on this!

If it is important to the solution, I am using the Vue3 bindings.

The problem I'm running into is being able to initialize a document using a value stored in the server. When i try to provide a document with initial values to syncedStore like so:

const doc = syncedStore({
  items: ['a', 'b', 'c'],
})

I get the error: Root Array initializer must always be empty array.

I can technically initialize everything to an empty state and then add the data, but if there is a race condition or users have slow connections/offline mode this will cause duplication of data when they re-sync. Furthermore I want the flow of my document to be as such:

New document creation:

  1. User navigates to /list/:listID
  2. If database has a list by that ID load and initialize syncedStore to existing list.
  3. If no list exists by that ID then initialize to empty list.
  4. Any users viewing the same list can make edits over WebRTC provider
  5. Document is autosaved to database periodically by the last user to make an update
  6. All users eventually leave but may come back periodically to make edits, sometimes collaboratively

In your opinion what is the best way to allow for a document to be persisted and picked up again allowing for initialization of a template document or an existing databased stored document?

Any help greatly appreciated!!!

milesingrams commented 2 years ago

I found this thread which you participated in: https://discuss.yjs.dev/t/initial-offline-value-of-a-shared-document/465/4 Sounds like you have solved this before! I still don't quite understand the best way to solve this even after reading the thread.

YousefED commented 2 years ago

Hi @milesingrams !

Great question, and I had to wrap my head around this topic as well as you can see in that thread.

The gist is this:

You should initialize the document / store, right after you consider the object "created". There probably is a moment where a user presses "Create List", where you want to create a new listID, this is the moment where you can also initialize the document.

What does this mean in your scenario? I think it should be split into two different scenarios: (A) viewing / editing an existing list (B) creating a new list. This would be what I recommend

A: Viewing / editing an existing list

  1. User navigates to /list/:listID 2a. If database has a list by that ID load and initialize syncedStore to existing list. 2b. If database doesn't have a list by that ID, show a 404

(And your previous points are still valid after 2a:

B: Creating a new List

  1. User clicks "Create new List"
  2. Server or client generates a new ID
  3. Initialise syncedStore to that ID
  4. Write the initial values (e.g.: store.items.push('A')) (this can happen on client or server)
  5. Redirect (or change URL dynamically) to /list:listID

Does this make sense? The reason you can't simply "create a store with initial values", is because in a distributed, offline-first environment, you can't know for sure whether a store "has been initialized or not", because you don't know yet whether you have received all updates from peers that exist "in the universe". Maybe another peer has initialized the document already? The only way to do this is when a central authority (server or database) signals to you that the id doesn't exist yet (or alternatively, if you rely on UUIDs you can do this client side as well), and you're sure you're the first one to write to the document by that id.

milesingrams commented 2 years ago

Hi Yousef, Thanks for the detailed response! I definitely follow why you need one source of truth for the Yjs object and therefore it must be created when the list is created. I gave it a try and am running into an interesting issue related to serializing and deserializing the entire document that I will also crosspost to YJS since I'm guessing it's from their end.

My plan is as such:

  1. Create new list and Yjs doc on server
  2. Persist Yjs doc to database in base64 since I'm using postgres:
    persistedYDoc = byteArrayToBase64(Y.encodeStateAsUpdate(yDoc))
  3. When any user goes to list page /list/:listID then load Yjs doc from DB and deserialize from base64:
    const loadedYDoc = new Y.Doc()
    Y.applyUpdate(loadedYDoc, base64ToByteArray(persistedYDoc))
  4. Connect Y doc to syncedStore over WebRTC
  5. Document is autosaved to database periodically by the last user to make an update

However when I try this the persisting and loading of the doc fails to yield identical docs. for example when I run the following code:

function serializeYDoc(yDoc: Y.Doc) {
  const documentState = Y.encodeStateAsUpdate(yDoc)
  const base64Encoded = fromUint8Array(documentState)
  return base64Encoded
}

function deserializeYDoc(base64YDoc: string) {
  const binaryEncoded = toUint8Array(base64YDoc)
  const deserializedYDoc = new Y.Doc()
  Y.applyUpdate(deserializedYDoc, binaryEncoded)
  return deserializedYDoc
}

/*
yDoc is a Y.Doc initialized as such:
{
  value: {
    test: 'This is a test'
  }
}
*/
const serialized1 = serializeYDoc(yDoc) // AQHT/L7sAwAoAQV2YWx1ZQR0ZXN0AXcOdGhpcyBpcyBhIHRlc3QA
const deserialized1 = deserializeYDoc(serialized1) // Doc { ..... }
const serialized2 = serializeYDoc(deserialized1) // AQHT/L7sAwAoAQV2YWx1ZQR0ZXN0AXcOdGhpcyBpcyBhIHRlc3QA

console.log(yDoc.toJSON()) // {"value":{"test":"this is a test"}}
console.log(deserialized1.toJSON()) // {} // This should be the same but is empty
console.log(serialized1 === serialized2) // true // Somehow when reserialized to base64 it still is the same

Note that the serialized and then deserialized doc outputs empty JSON despite being identical in base64 form.

Perhaps I am missing something, Is there a way to persist and load a Y.Doc to a database in base64 without having access to the original javascript object? Perhaps the problem is coming from the fact that I am applying the stateUpdate to a new Y.Doc, but I don't know of any Yjs function that lets you serialize and deserialize the entire document.

Any help much appreciated but I understand you are doing this all for free so no pressure! (both the best and hardest thing about being part of the open source world).

Thanks!

YousefED commented 2 years ago

Hi @milesingrams .

First of all, in your example, indeed it surprises me that deserialized1 is empty. Could you maybe share a CodeSandbox where you can reproduce the issue?

Document is autosaved to database periodically by the last user to make an update

You might want to make sure to atomically apply the document state as an update to the state currently in the database. This should cover the following scenario, but make sure to test it:

User A: makes a change and syncs it to the database User B: simultaneously makes a change and syncs it to the database

User A and User B are somehow not connected p2p (perhaps the webrtc connection has failed, or there simply has been a delay / client side error in p2p syncing). If you simply overwrite the database state with either User A or User B state, you'll lose the update from the other user. If you use applyUpdate, both updates will be reflected in the final document.

milesingrams commented 2 years ago

Hi Yousef, Thank you so much for the thorough response! I used your suggestions and found a way to get the toJSON to work by calling it on the 'value' map instead of the root doc. Overall SyncedStore is now working great for me!

YousefED commented 2 years ago

Happy to help! Keep me posted of your progress, great to learn how people are using the library