Have all protocols return "canonical" urls when possible in headers

RangerMauve commented 2 years ago

It'd be useful if protocols such as ipns/ipfs/hyper would return "canonical" urls for data that gets fetched whenever possible.

This behavior would include:

For cases when content at a subpath is individually addressible (such as IPFS/IPNS), the raw URL should be provided
For cases where a hostname is used to reference some data (e.g. hyper://example.com), the raw public key should be exposed.

This would make it easier to pluck out raw data from higher level abstractions (like raw file URLs from IPNS based datasets).

Ideally this would be in the Content-Location response header.

lidel commented 2 years ago

Hm.. Content-Location may not be the header for the job. iiuc the Content-Location is tied to HTTP layer specifics too much. Provides key for HTTP caching, especially around POST and GET with Accept header.

My understanding of its mechanics is along these lines:

request GET /file.txt with Accept: application/vnd.foo could return response Content-Location: /file.txt?format=foo – in theory the client/CDN/caching proxies can cache this information and all future requests for that resource+Accept will go directly to cached /file.txt?format=car
request POST /file.txt with some data produce a response with Content-Location: /uploaded/file.txt indicating where data ended up, so clients can cache it under that URL

Alt-Svc (https://github.com/ipfs/in-web-browsers/issues/144) is also not an option imo, because it is tied to HTTP family of protocols blessed by IANA.

Taking step back, "Canonical" links are tricky, especially in IPFS context where we have mutable and immutable URIs.

Who would be consuming such link? When to use "bookmarkable" ipns://en.wikipedia-on-ipfs.org that will always be upt-o-date? When to use "immutable snapshot" ipfs://{cid} that can be used for archiving? I think we should announce both, and letting client to decide which one is useful for task at hand.

I think Link header is a better choice than Content-Location, because you can provide multiple entries and specify relation of each, and there are multiple RFCs that use it. For example:

Canonical links allow putting rel="canonical" inLink` header:
```
Link: <https://example.com/page.php>; rel="canonical"
```
RFC6249: Metalink/HTTP: Mirrors and Hashes (https://github.com/ipfs/in-web-browsers/issues/179) expands use of Link and allows things like:
```
Link: <ipns://en.wikipedia-on-ipfs.org/wiki/>; rel=duplicate
Link: <ipfs://bafybeiemxf5abjwjbikoz4mc3a3dla6ual3jsgpdr4cjr3oz3evfyavhwq/wiki/>; rel=duplicate
```
With this metadata, the client can make a decision which transport can be used for fetching, and which address should be stored in "bookmarking" scenarios.

RangerMauve commented 1 year ago

Link headers are a good idea, they're just kinda a pain in the ass to parse back out. I think I started with the Link header in the hypercore-protocol handler actually.

The metalink/http thing is interesting.

AgregoreWeb / agregore-browser

Have all protocols return "canonical" urls when possible in headers #192