earthstar-project / earthstar

Storage for private, distributed, offline-first applications.
https://earthstar-project.org
GNU Lesser General Public License v3.0
623 stars 18 forks source link

ReplicaCache#onCacheUpdated not triggered until you call cache.queryDocs()? #329

Closed michielbdejong closed 1 month ago

michielbdejong commented 1 month ago

Looking at the code of the chat app tutorial, and adding two console.log statements to it:

[...]

// Read messages from chat.
const messages = document.getElementById("messages");

const cache = new Earthstar.ReplicaCache(replica);

function renderMessages() {
    console.log('renderMessages called');
    messages.innerHTML = "";

    const chatDocs = cache.queryDocs({
        filter: { pathStartsWith: "/chat" },
    });

    for (const doc of chatDocs) {
        const message = document.createElement("li");

        message.textContent = doc.text;

        messages.append(message);
    }
}

cache.onCacheUpdated(() => {
    console.log('onCacheUpdated called');
    renderMessages();
});

renderMessages();

const peer = new Earthstar.Peer();
peer.addReplica(replica);
peer.sync("http://localhost:8000", true);

You will see:

renderMessages called
onCacheUpdated called
renderMessages called

in the browser console. This looks like renderMessages is first called from the top level on page load, and then when some data gets loaded into the cache from the replica, it ends up being called a second time.

So you would think the renderMessages(); call on page load can safely be removed, and messages would still be rendered since renderMessages will be called a few milliseconds later anyway, once the cache is loaded from the replica from IndexedDB.

But this doesn't actually seem to be true, something unexpected happens if you comment out the renderMessages(); call: all three debug statements disappear, and no messages are rendered.

What is also unexpected is what happens when you put just the call to cache.queryDocs and not to the whole rendering code in its place:

[...]

// Read messages from chat.
const messages = document.getElementById("messages");

const cache = new Earthstar.ReplicaCache(replica);

function renderMessages() {
    console.log('renderMessages called');
    messages.innerHTML = "";

    const chatDocs = cache.queryDocs({
        filter: { pathStartsWith: "/chat" },
    });

    for (const doc of chatDocs) {
        const message = document.createElement("li");

        message.textContent = doc.text;

        messages.append(message);
    }
}

cache.onCacheUpdated(() => {
    console.log('onCacheUpdated called');
    renderMessages();
});

// renderMessages();
cache.queryDocs({
    filter: { pathStartsWith: "/chat" },
});

const peer = new Earthstar.Peer();
peer.addReplica(replica);
peer.sync("http://localhost:8000", true);

You will see:

onCacheUpdated called
renderMessages called

and messages are rendered successfully. Why is this? Is it true that ReplicaCache#onCacheUpdated is not triggered until you call cache.queryDocs()?

Bonus question: It gets even weirder if you remove the filter option and just call cache.queryDocs();, then you will see:

onCacheUpdated called
renderMessages called
onCacheUpdated called
renderMessages called

Why is onCacheUpdated triggered twice instead of once if you leave out the filter option?

michielbdejong commented 1 month ago

This is not a blocker for me, I can work around it now that I know about it, but it felt quirky so I'm simply curious why this works this way.

And thanks for this awesome project btw!

sgwilym commented 1 month ago

Hi @michielbdejong. The goal of ReplicaCache is to provide a synchronous API to the (asynchronous) Replica, in order to be easily used by certain libraries or frameworks. It does this by watching for events from Replica, and seeing if they contain any documents which have been queried for previously. If a document satisfies this criteria, it adds it to an in-memory cache of documents.

The reason the page is rendering before any data is available is because ReplicaCache must synchronously return a result when it is queried.

  1. On the first call, it does not have anything its cache, so it returns an empty result.
  2. Then In the background it works to (asynchronously) retrieve the result, and fires an update when it has it.
  3. This result is then added to ReplicaCache's in-memory cache of documents, so that it can be retrieved synchronously thereafter.

In short, the reason you do not see any results until you call cache.queryDocs is because the ReplicaCache does not know what you are interested in until you do.

michielbdejong commented 1 month ago

Thanks, that makes a lot of sense now!