IPFS features in web browsers

autonome commented 8 months ago

Introduction

The web today uses locations (URLs) almost exclusively as the resolver for content. This fails to meet user needs when content changes, websites are down, or companies close. While these constraints are features for many use cases, other use cases require content to be accessed independently of a single and/or original location. Examples are web archives for mitigating mis/disinformation, content shared across/between websites, and web applications which need to work on networks where DNS resolution is not possible.

The InterPlanetary File System (IPFS) meets these user needs in various ways depending on how and where it is implemented.

However, a common pattern is to access IPFS-addressed data via a single HTTP gateway, without verifying that the content is unmodified. This is problematic for many reasons, from privacy to data manipulation to scaling.

This is a proposal to discuss in the WICG how web browsers might integrate a subset of IPFS such that it is safe for users, aligns with the web's origin security model and does not require paradigmatic changes to browsers' HTTP-centric code base. Examples could be IPFS CIDs for SRI, Fetch integration, native addressing support, and verifiable retrieval of content-addressed resources from IPFS through multiple HTTP gateways as has been prototyped thus far, and detailed in this explainer.

The goals of incubation discussion at the WICG are:

Identify and specify feature integration points
Review privacy threats and solutions / mitigations
Review security threats and solutions / mitigations
Draft recommendations for user agent communication in user interfaces for end users
Scope and prioritize how future IPFS features could work on the web, using more of its transport-agnostic nature for user benefit

IPFS Use Today

The IPFS DHT has ~200,000 unique peers on average at the time of this writing. There are many client-only IPFS implementations which are not represented in this number.

HTTP use of the IPFS network through gateways is not represented in the DHT. There are many IPFS gateways, large and small. Some are listed at the IPFS Public Gateway Checker. The ipfs.io and dweb.link gateways are operated by Protocol Labs, and are averaging ~1.25 million requests per day.

There are many implementations and integrations of IPFS in non-browser contexts, from support in tools like curl, ffmpeg and mpv to implementations in various programming languages and platforms.

The metrics above are all available on an ongoing basis at https://probelab.io.

Browsers and Extensions

Chromium: Schemes are allow-listed. Deeper support in Chromium is an ongoing project by Protocol Labs, Little Bear Labs, and Igalia on everything from better non-HTTP address handling to experimental IPFS client support. Various issues filed for Chromium, blog post.
Brave Browser: Brave has had IPFS features for many years, from bundling IPFS Companion to native address support to running a full Kubo node. See their product documentation for more on their IPFS features.
Opera: Native IPFS address support in desktop, iOS and Android, redirected to an HTTP gateway as announced here.
Firefox: Has allow-listed the protocol schemes for browser extensions.
IPFS Companion: Browser extension available for Chrome (and Chromium based browsers) and Firefox, adds a suite of features to pair your browser with a locally-running IPFS node. Available in Chrome web store and Mozilla addons for Firefox.

Learn More

In the spirit of not megadumping the entire IPFS universe here, I've tried to keep this proposal concise until there's a decision about whether WICG is an appropriate venue for discussing IPFS features in web browsers or not.

If you'd like to read more about the Chromium multi-gateway verified client prototype, please read the explainer here.

If you'd like to learn more about IPFS generally, here are some places to start:

Feedback

Please provide all feedback below.

backkem commented 8 months ago

I certainly think this is worth exploring.

My main concern is around introducing full-stack vs narrow protocols/features. I would consider IPFS in its entirety as 'full stack'. What if I want to connect to other (decentralized) protocols? (Purely illustrative: Arweave or POKT come to mind) I would prefer, where possible, to identify narrow solutions to problems like service discovery, registering protocol handlers, etc. If we can find those, it puts the tools back in the hands of developers, à la "innovation at the edge".

From the ipfs-chromium explainer:

It also provides for opportunities down the road for natural synergies, for example using Chromium's mDNS abilities to discover nearby Kubo gateways.

This sounds similar to what we're doing in the Local Peer-to-Peer API. With this API you could connect securely to a IPFS gateway running on your LAN.

autonome commented 8 months ago

My main concern is around introducing full-stack vs narrow protocols/features.

The approach in the Chromium integration experiment has been primarily at the addressing and verification layers, so very very thin. All HTTP under the hood.

We likely need better language for how we talk about "protocols" vs "features".

What if I want to connect to other (decentralized) protocols? (Purely illustrative: Arweave or POKT come to mind)

This proposal is to talk about IPFS here at WICG, not because IPFS is decentralized, but in the context of specific end user needs that it meets, and vendor activity around it.

I would prefer, where possible, to identify narrow solutions to problems like service discovery, registering protocol handlers, etc. If we can find those, it puts the tools back in the hands of developers, à la "innovation at the edge".

+1

autonome commented 8 months ago

This sounds similar to what we're doing in the Local Peer-to-Peer API. With this API you could connect securely to a IPFS gateway running on your LAN.

Fantastic, a great example of the narrow/decoupled approach you described above.

backkem commented 7 months ago

Looks like I didn't fully internalize the scoping of the proposal. My apologies.

A predominant goal would likely be to allow the user agent to validate that the received content matches its content-address, preferably anywhere that content can be fetched by a (content-addressed) URI. The agent would have to:

(optional) Resolve the content-address, using IPNS or DNSLink.
Extract the content hash from the URI
Verify the hash against the fetched content

I guess it would be possible to loosly-couple each step. But each step, especially step 2, would require protocol-specific handling. The protocol-specific logic would have to be shipped with the browser or maybe the registerProtocolHandler could be extended to inject such logic.

John-LittleBearLabs commented 7 months ago

(optional) Resolve the content-address, using IPNS or DNSLink.

Extract the content hash from the URI

Verify the hash against the fetched content

If I may add steps 0 and 4:

Step 0: Have some control over which HTTP(s) endpoints are used for this protocol, and (if discovery is implemented) how additional endpoints are discovered.

Step 4: from the fetched & verified content, determine what additional content (blocks of data identified by their hashes) are necessary, and fetch them too. Once all necessary blocks are loaded, reassemble them into the top-level requested document.

I think this is really important to call out, since you brought up registerProtocolHandler which really seems to assume a 1:1 mapping of requests. With IPFS's trustless gateways, how many blocks you get (and which ones) depends on the precise kind of gateway request you're making. You could always insist on downloading every necessary block each time, but there's reasons an implementation might not want to do that.

backkem commented 7 months ago

Thanks for the additional insight John. That does make things more complex.

I was thinking if there is any precedent for intercepting requests and performing actions on them. Service Workers come to mind. Maybe there is a possible combination of the ServiceWorker and registerProtocolHandler concepts. That would allow a lot of flexibility for protocol-specific client-side validation. The question is how this type of ProtocolHandler ServiceWorker would inform the user that validation is performed successfully.

backkem commented 7 months ago

I saw there is an existing discussion about the ServiceWorker-like protocol handlers in ipfs/in-web-browsers#212 .

A potential way that the content verification could be bridged is as follows:

The ServiceWorker does protocol (IPFS) specific content validation. This can support validating multiple blocks as needed.
If succeeded, it sets a more generic Content-Digest for the returned content (some existing discussion in ipfs/in-web-browsers#185).
The browser adds a visual indication of the Content-Digest validation to inform the user.

While this would be a way to separate the concerns of the protocol and the user agent, it comes with the trade-off of somewhat indirect validation.

WICG / proposals

IPFS features in web browsers #143

Introduction

Feedback