btrask / stronglink

A searchable, syncable, content-addressable notetaking system
Other
1.04k stars 39 forks source link

URI normalization #15

Open btrask opened 9 years ago

btrask commented 9 years ago

Right now we treat hash URIs as strings. The hash algorithm does its own logic to generate a list of arbitrary paths and we use them as-is.

I think what we should do is redefine the hash algorithms to output a single byte string, which is then encoded in a variety of standard ways. As far as I can tell, this is a reasonable requirement for every potential hash algorithm, i.e. there isn't much reason for an algorithm to want to use path separators or meaningful strings.

Advantages:

This probably requires support for buffers/blobs from the database schema. Ideally they would be escaped rather than length-prefixed so we could easily do prefix matching (in case we want to support arbitrary lengths).

btrask commented 9 years ago

We could also add support for urn: and magnet:...

zimbatm commented 7 years ago

Did you consider removing the short version? I am not convinced that truncating a string makes the hash much more friendly to use by a human. In both cases I would copy-and-paste the string.

If the hash length is known then it's possible to deduce the encoding. b64 of sha256 should always be the same length.

btrask commented 7 years ago

@zimbatm Thanks for the comment!

I agree that even short hashes are a pain to type. However, full hashes can be pretty unwieldy to share in chat/email (wrapping over multiple lines, etc). The main reason for supporting base-64 in the first place is for shorter URIs, so sacrificing truncation for it seems undesirable. (The holy grail, in terms of shortness, is base-64 with truncation.)

That said my mind is still open and I'm happy to hear more feedback/ideas.

zimbatm commented 7 years ago

Ok there are two things here. Standardization of the hash and UX for the user.

Standardization is a worthy goal on it's own. It would be nice if tools like yours, IPFS, nix, camlistore, git, ... all used the same hashing algorithms for blobs and directories. That way, a rendered HTML that links to another hash could still be valid in all these systems.

What you are proposing with the truncated hash is to exchange some of the security of the hashes for convenience. I believe that by default the system should generate full hashes (for example when generating a blob with a link to another object) so that it's secure by default. Then when transmitting links to users it's possible to make different trade-offs.

One possibility would be to introduce another URI scheme for shortened hashes that standardizes the text-encoding to url-safe base64 and thus allow truncating.

Another approach is to add another layer of indirection that allows to map a name bash to hash. Typically a link shortener service. ha.sh/myname -> hash://sha256-0qb18k2rp6bbg8g50754srl95dq0lr96i297856yhrx1hh1ja37z. The link could also be signed by the issuer to not delegate too much trust to the naming service. This is also useful to keep a mutable reference to a document.

btrask commented 7 years ago

Standardization is a worthy goal on it's own. It would be nice if tools like yours, IPFS, nix, camlistore, git, ... all used the same hashing algorithms for blobs and directories. That way, a rendered HTML that links to another hash could still be valid in all these systems.

Agreed. However, the biggest challenge here is the way hashes are computed, not their representation. IPFS, Camlistore, Git, and BitTorrent (not sure about Nix) all "salt" their hashes, which makes them all incompatible. (I've written a few articles about this, if you haven't seen them.)

I've already got code (in both C and JavaScript) that parses and generates hashes used by several different applications (as a part of the Hash Archive project; I haven't gotten around to extracting it into reusable libraries yet). That part is much easier. See #111.

What you are proposing with the truncated hash is to exchange some of the security of the hashes for convenience. I believe that by default the system should generate full hashes (for example when generating a blob with a link to another object) so that it's secure by default. Then when transmitting links to users it's possible to make different trade-offs.

That is what the system currently does.

One possibility would be to introduce another URI scheme for shortened hashes that standardizes the text-encoding to url-safe base64 and thus allow truncating.

This is probably what will happen, eventually.

I'm really not a fan of link shorteners, even with signing, but in some specific cases they might be the best option.