Open bbondy opened 3 years ago
I keep collecting notes about verifiable HTTP responses in https://github.com/ipfs/in-web-browsers/issues/128. It is a surprise to many that it is not a clear-cut thing.
TLDR is that files bigger than 256KB are chunked and represented as a DAG, where each level is hashed (like in git), so the root CID is not representing the hash of the file, but hash of the DAG representation of the file.
This means right now it is not possible to verify responses bigger than 256KB without knowing how DAG looks like, and for that you need to run IPFS node.
We are looking into various ways of solving this, details listed on the linked issue, but in case of Brave, I see an additional way of having verifiable gateway responses in form of ipfs://
backed by a public gateway and CAR export/import:
ipfs daemon --offline
) which means it does not connect to the swarm, but still provide local gateway.ipfs://
in Brave download CAR from gateway and then ipfs dag import file.car
to the local datastore before sending the request to local gateway.ipfs://
in address bar, provide integrity verification via go-ipfs.
@bbondy does this sound feasible, or should we wait for gateway responses that do not require go-ipfs?
We can't install go-ipfs without the user opt'ing into it and I think asking the user to opt into this would be complicated for the user UI-wise.
@lidel what about if we add some basic protocol support directly into Brave. This is maybe a start of future things to come. Maybe you can describe how we could do this at the protocol level?
Ok, so I let's scope the verification problem to files represented with unixfs (files and directories). Below is a broad strokes explainer that should make it easier to reason about what needs to be done;
In IPFS unixfs files can be represented as a CID with one of two multicodecs:
dag-pb
- a block of raw bytes wrapped in unixfsv1 protobuf manifest that lists optional links to child nodes that are included in the calculation of the final hash (~ for files bigger than 256KB)raw
- a single block of raw bytes without any metadata nor children (small files and raw leaves of bigger dag-pb)If you want to validate CID without running IPFS node you need to:
raw
then you can just hash the payload and compare it with the hash inside of CID. Done.dag-pb
you need to read the protobuf envelope somehow to know if the CID represents only a single block, or is a parent and additional blocks need to be fetched.
(2) is easy and could be implemented for small files as a PoC (3) is difficult because metadata information can't be fetched from the same gateway that we are trying to verify :trollface:
Due to this, we could:
@aschmahmann mind doing sanity check on this? I don't see (C), but lmk if I missed something.
I guess B is best to avoid collusion between known preconfigured 2 gateways.
Quick update: go-ipfs 0.9.0 will expose /api/v0/dag/export
on every public gateway (https://github.com/ipfs/go-ipfs/pull/8111).
It enables thin clients to fetch an archive of entire DAG in a trustless way.
The client working in offline mode (ipfs daemon --offline
) will be able to import exported archive via ipfs dag import --pin-roots=false
FYSA this is now possible thanks to verifiable Block and CAR responses on HTTP Gateways:
There are also:
ipfs://
and ipns://
in address bar: https://github.com/little-bear-labs/ipfs-chromium
Currently you only have a guarantee that the files you're accessing on IPFS are what they say they are if you're using a local node. This task is to check the contents of files that are loaded against the CID so that even if you're using a gateway, you can be sure the gateway is not doing anything sketchy.