brave / brave-browser

Brave browser for Android, iOS, Linux, macOS, Windows.
https://brave.com
Mozilla Public License 2.0
17.81k stars 2.33k forks source link

When Brave is configured to use a public gateway, enforce checking IPFS hashes automatically #13500

Open bbondy opened 3 years ago

bbondy commented 3 years ago

Currently you only have a guarantee that the files you're accessing on IPFS are what they say they are if you're using a local node. This task is to check the contents of files that are loaded against the CID so that even if you're using a gateway, you can be sure the gateway is not doing anything sketchy.

lidel commented 3 years ago

I keep collecting notes about verifiable HTTP responses in https://github.com/ipfs/in-web-browsers/issues/128. It is a surprise to many that it is not a clear-cut thing.

TLDR is that files bigger than 256KB are chunked and represented as a DAG, where each level is hashed (like in git), so the root CID is not representing the hash of the file, but hash of the DAG representation of the file.

This means right now it is not possible to verify responses bigger than 256KB without knowing how DAG looks like, and for that you need to run IPFS node.

Verifiable gateway via IPFS node in offline mode and CAR import

We are looking into various ways of solving this, details listed on the linked issue, but in case of Brave, I see an additional way of having verifiable gateway responses in form of ipfs:// backed by a public gateway and CAR export/import:

@bbondy does this sound feasible, or should we wait for gateway responses that do not require go-ipfs?

bbondy commented 3 years ago

We can't install go-ipfs without the user opt'ing into it and I think asking the user to opt into this would be complicated for the user UI-wise.

@lidel what about if we add some basic protocol support directly into Brave. This is maybe a start of future things to come. Maybe you can describe how we could do this at the protocol level?

lidel commented 3 years ago

Ok, so I let's scope the verification problem to files represented with unixfs (files and directories). Below is a broad strokes explainer that should make it easier to reason about what needs to be done;

In IPFS unixfs files can be represented as a CID with one of two multicodecs:

If you want to validate CID without running IPFS node you need to:

  1. Look at multicodec in the CID:
  2. if it is raw then you can just hash the payload and compare it with the hash inside of CID. Done.
  3. if it is dag-pb you need to read the protobuf envelope somehow to know if the CID represents only a single block, or is a parent and additional blocks need to be fetched.
    • this is because behind the scene HTTP gateway re-assembles all blocks from dag-pb tree and returns only the raw bytes of entire original file. In other words, envelopes of all individual blocks are "lost in translation" between IPFS and HTTP, which makes CID validation impossible with the raw data alone.

(2) is easy and could be implemented for small files as a PoC (3) is difficult because metadata information can't be fetched from the same gateway that we are trying to verify :trollface:

Due to this, we could:

@aschmahmann mind doing sanity check on this? I don't see (C), but lmk if I missed something.

bbondy commented 3 years ago

I guess B is best to avoid collusion between known preconfigured 2 gateways.

lidel commented 3 years ago

Quick update: go-ipfs 0.9.0 will expose /api/v0/dag/export on every public gateway (https://github.com/ipfs/go-ipfs/pull/8111). It enables thin clients to fetch an archive of entire DAG in a trustless way.

The client working in offline mode (ipfs daemon --offline) will be able to import exported archive via ipfs dag import --pin-roots=false

lidel commented 11 months ago

FYSA this is now possible thanks to verifiable Block and CAR responses on HTTP Gateways:

There are also: