beakerbrowser / beaker

An experimental peer-to-peer Web browser
https://beakerbrowser.com/
MIT License
6.75k stars 545 forks source link

Question: How does Beaker handle caching dependencies? #752

Open webdesserts opened 6 years ago

webdesserts commented 6 years ago

For example if a dat://site.com has <link> and <script> elements that refer to an external dat://, lets say... dat://fonts.com, and I add dat://site.com to my library, can I expect the dat://fonts.com files that I'm referencing to be permanently cached as long as dat://site.com is? Also would those files be seeded to anyone else trying to access that site? As an extension how about if I host something on hashbase.io? Would that only rehost the dat://site.com and none of its dependencies?

pfrazee commented 6 years ago

can I expect the dat://fonts.com files that I'm referencing to be permanently cached as long as dat://site.com is

Not currently. I'd need to think about the mechanisms involved for enabling that.

Also would those files be seeded to anyone else trying to access that site?

Any file you download from dat will be seeded, so yes.

As an extension how about if I host something on hashbase.io? Would that only rehost the dat://site.com and none of its dependencies?

Correct. Beaker and hashbase only think in terms of the domain you're interacting with currently.

webdesserts commented 6 years ago

So here's my thoughts:

If dependencies aren't permanently cached when you "install" for offline use, then the app still (eventually) requires the network and isn't truly offline. You could build your site to avoid this issue by including all your app dependencies in the app's dat, but I feel that this conflicts with one of the main pieces that attracts me to a dat based web: sharing & rehosting of dependencies. Essentially, as a library developer, I want to host my JavaScript library in a dat where everyone can use it with no benefits lost over manually installing the dependency in their app via npm or some other means. The impermanence of cached dependencies is the main benefit lost for me right now.

Side note: I think it would be cool if you could ask hashbase to rehost a slew of dats without giving them a short name. It would be even cooooler if hashbase could parse your html, derive what dats you depend on and suggest you rehost them. But that's more of a dream feature.

(I'll create an issue on hashbase for this later)

pfrazee commented 6 years ago

I agree this is something that ought to happen (EDIT: or rather a problem that should be fixed) but there's a tradeoff regarding disk usage and predictable effects, and also rule complexity. I'm hesitant to add mechanisms without knowing all of the related pieces, because I don't want to have to change the rules on app/site devs later.

Here are some questions we need to address:

  1. Is it feasible to automatically detect dependencies based on the runtime behavior and permacache them?
  2. If not, is there a manifest field we can add to describe which sites ought to be saved with the current site?
  3. Is it possible that manifest field would serve additional purposes, such as specifying which version of the other sites we want? (Could it end up being similar to package.json's dependencies field.)
  4. Will the mechanism make it difficult to predict or calculate the size of an installed application? (This would be a problem for users deciding whether to save for offline.)
  5. Is it possible the mechanism could be used to attack users or do something counter to the user's wishes?
  6. Are we sure that we're going to want to do runtime imports? Is it possible that the first-load performance of multiple dats will be so bad (and perhaps unreliable) that devs create bundles instead, and simply reference the origins in their manifests for future builds? (One idea that I think is worth considering is, whether the browser ought to provide tooling for running build scripts!)

Yeah feel free to make an issue on hashbase. I'm hoping we can, in the near future, reduce hashbase to a web api so that you don't have to wait on us for a change like that.

webdesserts commented 6 years ago

Runtime Checks

Yeah I can't think of a way that runtime checking would be feesable. If runtime checking was involved, I think it would have to be combined with a manifest file. The manifest file could say "please permacache anything that I request from these blessed dat sites". However, I think any runtime checking is going to result in a "we'll install this app as we go" setup rather than a "we'll install this entire app right now" setup, which is what I would expect for an offline app.

Manifest File

That said, I think a manifest file like you're suggesting might be the best route. I think we would want to provide a means declare a dependency on single files or folders out of a dat. I imagine that in the future, organizations might want to host large dats with multiple resources in them. For example there might be a dat with 100s of fonts or a dat with every version of a library that has ever been published. You wouldn't want the whole dat, you would just want the file you need. At the same time I could see how you might want to cache every file within a feed/ folder as it comes in.

Future of Bundling

As for thoughts on bundling, in my experience, the main reason I bundle applications is not for performance/reliability reasons, but for modules (don't get me wrong, performance is a good reason). ES6 modules are still a short ways out and until they're here, JavaScript doesn't have a native module system. To my understanding that was one of the main selling points of Browserify & Webpack, they brought the npm module ecosystem to the web.

When es6 modules do start to gain support, that's going to drop one of the incentives to immediately start bundling and at the very least, I would imagine more developers might start developing their apps with raw es6 modules. That's not to say I think bundling will go away, I just think it might get pushed down on the priority list.

Even with bundles you still might have externalized dependencies. For example, rotonde-client could be distributed as a bundle rather than multiple scripts, but rotonde-client would still need to be permacached if a user was to install their portal for offline use.

Partial Upgrades

One thing that I do think we should be concerned about with an in-browser package manifest is partial upgrades (this kinda gets into the reliability issues you mentioned). Normally, in the bundle world, when you release an update to your app you are guaranteed that the user either gets the entire bundle or the app breaks. With an in-browser package manifest, you would have the chance of partial upgrades. For example: your app is installed for offline use. It currently depends on dep a and dep b. Lets say the dev publishes a new version of the app that now relies on dep c but our user is having trouble downloading dep c. Do we hold back the dat update until all deps are downloaded? Do we just let the app crash and hope the user finds a better connection soon? Is this the app developer's responsibility to resolve? After all they do have access to the dat history, maybe they could include a library that does those types of fallbacks on their own?

Malicious Concerns

I think the main malicious use for this that I could think of would be unwelcome seeding of other sites. Imagine a developer who dumps the same dependency manifest in all of their side projects, so that they could quickly increase the availability of all of their sites. This, however, isn't specific to the permacache, it's just the permacache exacerbates the issue. Correct me if I'm wrong, but as far as I know I could go and drop 100+ (new DatArchive(url)).download("/") statements in one of my sites and at least max out any cache limits y'all've set. Not sure how defensive y'all are on that front atm or what types of rules are in place.

pfrazee commented 6 years ago

Good observations. The rotonde-client is already being used as an external dependency, right?

Let's keep this issue open and add to it as we gather experience. I'll be especially interested to see how things work with es modules landed (which cant be far off, see https://github.com/electron/electron/pull/10213). You're right that the DX of es modules & dat will probably be pretty good. We'll need to see what the load performance is actually like for external dats.

webdesserts commented 6 years ago

@pfrazee Right, rotonde-client is currently an external dependency of the user's portal.

Sounds good, I'll play around with userland dependency management for a bit and see where that goes.

rhythnic commented 4 years ago

I'm late to the conversation. I was thinking about this today too. I tried out a little demo entering a dat url into the dynamic import API and it works beautifully so far. I can't say anything about the caching but the fetching seems to be working well.

dat://dat-as-a-dynamic-js-module.hashbase.io/