btrask / stronglink

A searchable, syncable, content-addressable notetaking system
Other
1.04k stars 45 forks source link

Hash URI encoding ambiguity #110

Open btrask opened 8 years ago

btrask commented 8 years ago

As described in this post, our hash URI scheme is ambiguous when supporting multiple hash encodings (for example, hexadecimal and base-64).

Examples from the post:

hash://sha256/9f86d081884c7d659a2feaa0c55ad015a3bf4f1b2b0b822cd15d6c15b0f00a08 (current hex format) hash://sha256/n4bQgYhMfWWaL-qgxVrQFaO_TxsrC4Is0V1sFbDwCgg (base-64-url, ambiguous) hash://sha256/b64-n4bQgYhMfWWaL-qgxVrQFaO_TxsrC4Is0V1sFbDwCgg (base-64 with encoding name prefix) hash://sha256/_n4bQgYhMfWWaL-qgxVrQFaO_TxsrC4Is0V1sFbDwCgg (leading underscore for simple disambiguation from hex) hash://b64.sha256/n4bQgYhMfWWaL-qgxVrQFaO_TxsrC4Is0V1sFbDwCgg (encoding as algorithm’s “subdomain”) hash://sha256.b64/n4bQgYhMfWWaL-qgxVrQFaO_TxsrC4Is0V1sFbDwCgg (encoding as algorithm’s “top level domain”) hash://sha256/b64/n4bQgYhMfWWaL-qgxVrQFaO_TxsrC4Is0V1sFbDwCgg (encoding as path component) hash://sha256/n4bQgYhMfWWaL-qgxVrQFaO_TxsrC4Is0V1sFbDwCgg?enc=base64 (encoding as query parameter; probably a terrible idea, only included for completeness)

Since then, I've become more fond of the path compoent version (/b64/). For hex, we'd probably use /b16/ (due to unfortunate connotations of the word "hex"...). Unmarked encodings would be treated as hex for backward compatibility, and it would continue to be officially supported.

Unfortunately this labeling makes base-64 encoded URIs slightly longer (by 4 characters). Hash URIs are already the longest of the various content addressing schemes, and this negates some of the advantage of using base-64 in the first place. On the other hand, I think there's value in being very flexible with user input.

Related to #15.