amark / gun

An open source cybersecurity protocol for syncing decentralized graph data.
https://gun.eco/docs
Other
18.05k stars 1.16k forks source link

.once not triggering consistently when Gun is used inside a ServiceWorker #681

Open davide opened 5 years ago

davide commented 5 years ago

I've created a library called p2p-fetch that under the hood uses Gun (currently version 0.9.999995).

The library employs a ServiceWorker to catch web requests in flight and if they happen to match a configured FETCH_PATTERN instead of loading the resource from the web a lookup is done to see if that asset is already cached in [distributed] Gun database. If it is already there then the Gun stored content is used, otherwise it fallbacks to a regular web request (storing the content of the web response in the Gun database for reuse by others).

To stress test the Gun logic I included an additional parameter FORCE_P2P which will pass in to the get(...).once(...) call a big timeout value to make sure that it never fallbacks to the regular web request.

During those tests I started noticing that sometimes the test page would have requests hanging, never triggering the .once callback. For the p2p-fetch library having the .once callback called is critical, even if it's to inform that there's no data to start the web fallback.

After some debugging it seems that all those failed requests go through this line: https://github.com/amark/gun/blob/9f9b3e557c1b9f9ff3d774edcec30271a596d19a/src/chain.js#L91 Which funny enough has a TODO and a reference to a possible bug. 😄

Here are the steps to replicate the issue:

The source code for that example page lives here: https://github.com/davide/p2p-fetch/tree/master/examples/lego-star-wars

I haven't excluded the possibility of a messed up interaction / lack of readiness from the ServiceWorker but wanted to report this issue so that anyone more experienced with the Gun API can take a peek.

Thanks in advance for any help or tips on how to further debug this issue.

davide commented 5 years ago

Removed some code that was calling .once inside another .once callback.

New demo with that new code: https://lego-star-wars-njfrsbrzyd.now.sh/ https://lego-star-wars-njfrsbrzyd.now.sh/?FORCE_P2P=true

Noticed that during the first page load after closing and opening the browser no .once callbacks are triggered, but waiting around 30s and then refreshing the page they are serving the data from the Gun database.

davide commented 5 years ago

Here's what you should see during that first load (it won't budge from this): p2p-fetch_first_load

bugs181 commented 5 years ago

Hey, sorry you're running into this bug! Just curious if you could try to do the logic in the callback of the gun.put instead? That may solve some issues.

gun.get(encodeURI(url)).put(..., function(ack) {
                   resolve(
                      new Response(blob, init)
                    );
})

The reason I believe the above solution would work, is due to Gun has an internal timeout which should call the ack callback eventually. Inside of the callback, you can check whether ack.err. From there, you can reject the promise if it errors. Note:

The same is true for gun.get()

The callback is a listener for read errors, not found, and updates. It may be called multiple times for a single request, since gun uses a reactive streaming architecture.

davide commented 5 years ago

Hi Levi! Just applied that patch but no changes: p2p-fetch_first_load-v2

davide commented 5 years ago

After a bit the 3 requests fail with this error: "Failed to load resource: net::ERR_FAILED"

bugs181 commented 5 years ago

I don't seem to be receiving any console messages about Gun loading on Safari 12.

Here's what I'm receiving on my end:

[Log] P2P FETCH: ServiceWorker registration successful with scope:  – "https://lego-star-wars-njfrsbrzyd.now.sh/" (p2p-fetch.js, line 58)
[Log] P2P FETCH: ServiceWorker active (p2p-fetch.js, line 65)
amark commented 5 years ago

Hmm, I wonder if this also relates to an issue @mmalmi found... which I think his might have to do with .once getting the metadata / empty object and then later (but not within .once debounce's timeout) getting the rest of the data. This note is mostly just a reminder of myself of this... still need to dig into this thread more - was dealing with the security upgrades earlier this week and just now starting to have more time, but still trying to process things in various priority orders, so I apologize for the delay!

davide commented 5 years ago

Sorry for the delay @bugs181. RL...

When using service workers there are multiple parallel threads in place (1 for the ServiceWorker and 1 for each tab). The first tab is the one that kicks off the ServiceWorker thread - which is where Gun is loaded (only once). Loading a new tab doesn't get Gun to load again. If you see the "P2P FETCH: ServiceWorker active" message then the ServiceWorker was correctly installed and inside it there should already be a working Gun instance. Unless there were any disconnects (not yet sure how to tackle that).

It took quite a few trials and errors to get Gun to work inside the ServiceWorker and there are still some hacks left around... I'll look into the Service Workers Lifecycle documentation to make sure Gun is properly set up (minus the hacks).

Thanks for chipping in @amark!

davide commented 5 years ago

@bugs181 I mislead you in my previous comment!

Changed the p2p-fetch logging to better reflect the sequence of events and removed the hacks I mentioned earlier. I was relying on the Gun 'hi' event to resolve the "connect" promise but never saw that triggering (any idea why?). That (and frustration) got me to add the dumb 4s second resolve timeout - now dropped with no negative consequences.

New demo URLs: https://lego-star-wars-fvenmeeizt.now.sh/ https://lego-star-wars-fvenmeeizt.now.sh/?FORCE_P2P=true

The replication steps remain the same, but the first load time is now [4s] faster.

davide commented 5 years ago

Well... that was dumb. I was misusing the "connect" promise. Code updated again so that the gun instance is created during:

New demo URLs: https://lego-star-wars-amofuziquf.now.sh/ https://lego-star-wars-amofuziquf.now.sh/?FORCE_P2P=true

@amark is there a Github issue for @mmalmi 's issue? This might just be a duplicate of that.

davide commented 5 years ago

Facepalming like a pro!

The SERVER parameter I had been using was incorrect - hence the lack of 'hi' events from Gun. This is actually very informative: the issue of .once not triggering happens even when there is NO SERVER CONNECTIVITY.

Updated the p2p-fetch code and demo to actually exercise Gun, giving it 30 seconds to load data from other peers before falling back to the web. New demo URLs: https://lego-star-wars-ublgmdzsfr.now.sh https://lego-star-wars-ublgmdzsfr.now.sh/?FORCE_P2P=true

davide commented 5 years ago

Another p2p-fetch update, this time to ensure that all page requests are caught -- which needs some hacks of its own to get working.

The current version (demo link) continues with a 30s Gun timeout. This means that the first load (when there's no data) will take those 30s.

bugs181 commented 5 years ago

Another solution would be to introduce a cache-like URL. I'm not familiar with the ServiceWorkers, but if you could inspect the request, then on first-load you could assume it's not in Gun and load it immediately. Every point there-after, you redirect that query to Gun. If you need confirmation of whether it's really in Gun, you could check the state of the property for the image URL. If the state is newer than local (which won't exist on first load), then you can fallback.

davide commented 5 years ago

@bugs181 thanks for that last comment! This sentence "assume it's not in Gun and load it immediately" unblocked a partial-solution! :)

We can rely on the synchronous behavior of #once to check if the data is available locally and, if it is, serve it right away. Got a new commit in with this!

With that change things look more deterministic! Here are the new test URLs: https://lego-star-wars-nlablyatnm.now.sh/ https://lego-star-wars-nlablyatnm.now.sh/?FORCE_P2P=true

The key issue now is that #once

The extreme case is when you open a new browser and pass in ?FORCE_P2P=true (which sets a timeout of 9999999). In that case gun db will just hang in there forever.

While exploring this issue I realized that passing in parameters to a ServiceWorker through it's URL path is actually a VERY BAD IDEA: any changes in the parameters actually setups up new ServiceWorkers. I'll have to change the p2p-fetch API to fix this (using message passing to pass in the ServiceWorker configurations).

bugs181 commented 5 years ago

We can rely on the synchronous behavior of #once to check if the data is available locally

Oh. That's even better! Awesome work here. Very exciting stuff.

Instead of using .once you may try the .get(callback) workaround. It's a higher level function and has a few gotchas - but for realtime makes sense to use it in this case.

bugs181 commented 5 years ago

@amark it sounds like @mmalmi uncovered something eerily similar to using .off() and .once() chaining. See https://github.com/amark/gun/issues/685

davide commented 5 years ago

@bugs181 thanks again for your help on this issue!

I abandoned the usage of gun.once() with a timeout and went ahead with a custom "once" implementation using gun.get(key, callback).

With that change p2p-fetch is now working as expected. See it in action here: https://lego-star-wars-rjggyxohoe.now.sh?FORCE_P2P=true

bugs181 commented 5 years ago

Awesome! Yes, sometimes the best course of action is to use the lower level API. I'm glad you were able to come to a solution. Apologies that the API didn't behave as you expected; and that I haven't helped as much as I would of liked. We've all been pretty busy in the Gun community!