WICG / proposals

A home for well-formed proposed incubations for the web platform. All proposals welcome.
https://wicg.io/
Other
213 stars 9 forks source link

IPFS features in web browsers #143

Open autonome opened 3 months ago

autonome commented 3 months ago

Introduction

The web today uses locations (URLs) almost exclusively as the resolver for content. This fails to meet user needs when content changes, websites are down, or companies close. While these constraints are features for many use cases, other use cases require content to be accessed independently of a single and/or original location. Examples are web archives for mitigating mis/disinformation, content shared across/between websites, and web applications which need to work on networks where DNS resolution is not possible.

The InterPlanetary File System (IPFS) meets these user needs in various ways depending on how and where it is implemented.

However, a common pattern is to access IPFS-addressed data via a single HTTP gateway, without verifying that the content is unmodified. This is problematic for many reasons, from privacy to data manipulation to scaling.

This is a proposal to discuss in the WICG how web browsers might integrate a subset of IPFS such that it is safe for users, aligns with the web's origin security model and does not require paradigmatic changes to browsers' HTTP-centric code base. Examples could be IPFS CIDs for SRI, Fetch integration, native addressing support, and verifiable retrieval of content-addressed resources from IPFS through multiple HTTP gateways as has been prototyped thus far, and detailed in this explainer.

The goals of incubation discussion at the WICG are:

IPFS Use Today

The IPFS DHT has ~200,000 unique peers on average at the time of this writing. There are many client-only IPFS implementations which are not represented in this number.

HTTP use of the IPFS network through gateways is not represented in the DHT. There are many IPFS gateways, large and small. Some are listed at the IPFS Public Gateway Checker. The ipfs.io and dweb.link gateways are operated by Protocol Labs, and are averaging ~1.25 million requests per day.

There are many implementations and integrations of IPFS in non-browser contexts, from support in tools like curl, ffmpeg and mpv to implementations in various programming languages and platforms.

The metrics above are all available on an ongoing basis at https://probelab.io.

Browsers and Extensions

Learn More

In the spirit of not megadumping the entire IPFS universe here, I've tried to keep this proposal concise until there's a decision about whether WICG is an appropriate venue for discussing IPFS features in web browsers or not.

If you'd like to read more about the Chromium multi-gateway verified client prototype, please read the explainer here.

If you'd like to learn more about IPFS generally, here are some places to start:

Feedback

Please provide all feedback below.

backkem commented 3 months ago

I certainly think this is worth exploring.

My main concern is around introducing full-stack vs narrow protocols/features. I would consider IPFS in its entirety as 'full stack'. What if I want to connect to other (decentralized) protocols? (Purely illustrative: Arweave or POKT come to mind) I would prefer, where possible, to identify narrow solutions to problems like service discovery, registering protocol handlers, etc. If we can find those, it puts the tools back in the hands of developers, à la "innovation at the edge".

From the ipfs-chromium explainer:

It also provides for opportunities down the road for natural synergies, for example using Chromium's mDNS abilities to discover nearby Kubo gateways.

This sounds similar to what we're doing in the Local Peer-to-Peer API. With this API you could connect securely to a IPFS gateway running on your LAN.

autonome commented 3 months ago

My main concern is around introducing full-stack vs narrow protocols/features.

The approach in the Chromium integration experiment has been primarily at the addressing and verification layers, so very very thin. All HTTP under the hood.

We likely need better language for how we talk about "protocols" vs "features".

What if I want to connect to other (decentralized) protocols? (Purely illustrative: Arweave or POKT come to mind)

This proposal is to talk about IPFS here at WICG, not because IPFS is decentralized, but in the context of specific end user needs that it meets, and vendor activity around it.

I would prefer, where possible, to identify narrow solutions to problems like service discovery, registering protocol handlers, etc. If we can find those, it puts the tools back in the hands of developers, à la "innovation at the edge".

+1

autonome commented 3 months ago

This sounds similar to what we're doing in the Local Peer-to-Peer API. With this API you could connect securely to a IPFS gateway running on your LAN.

Fantastic, a great example of the narrow/decoupled approach you described above.

backkem commented 3 months ago

Looks like I didn't fully internalize the scoping of the proposal. My apologies.

A predominant goal would likely be to allow the user agent to validate that the received content matches its content-address, preferably anywhere that content can be fetched by a (content-addressed) URI. The agent would have to:

  1. (optional) Resolve the content-address, using IPNS or DNSLink.
  2. Extract the content hash from the URI
  3. Verify the hash against the fetched content

I guess it would be possible to loosly-couple each step. But each step, especially step 2, would require protocol-specific handling. The protocol-specific logic would have to be shipped with the browser or maybe the registerProtocolHandler could be extended to inject such logic.

John-LittleBearLabs commented 3 months ago
  1. (optional) Resolve the content-address, using IPNS or DNSLink.
  2. Extract the content hash from the URI
  3. Verify the hash against the fetched content

If I may add steps 0 and 4:

Step 0: Have some control over which HTTP(s) endpoints are used for this protocol, and (if discovery is implemented) how additional endpoints are discovered.

Step 4: from the fetched & verified content, determine what additional content (blocks of data identified by their hashes) are necessary, and fetch them too. Once all necessary blocks are loaded, reassemble them into the top-level requested document.

I think this is really important to call out, since you brought up registerProtocolHandler which really seems to assume a 1:1 mapping of requests. With IPFS's trustless gateways, how many blocks you get (and which ones) depends on the precise kind of gateway request you're making. You could always insist on downloading every necessary block each time, but there's reasons an implementation might not want to do that.

backkem commented 3 months ago

Thanks for the additional insight John. That does make things more complex.

I was thinking if there is any precedent for intercepting requests and performing actions on them. Service Workers come to mind. Maybe there is a possible combination of the ServiceWorker and registerProtocolHandler concepts. That would allow a lot of flexibility for protocol-specific client-side validation. The question is how this type of ProtocolHandler ServiceWorker would inform the user that validation is performed successfully.

backkem commented 2 months ago

I saw there is an existing discussion about the ServiceWorker-like protocol handlers in ipfs/in-web-browsers#212 .

A potential way that the content verification could be bridged is as follows:

While this would be a way to separate the concerns of the protocol and the user agent, it comes with the trade-off of somewhat indirect validation.