core-wg / href

Other
2 stars 0 forks source link

[late suggestion] Considering binary paths #60

Closed chrysn closed 1 year ago

chrysn commented 1 year ago

I know it's late in the design process, but here's an idea I'd like to run by you all. I'm fine if it's turned down, but I do see some merits.

So far, we're taking CoAP's stance that Uri-Path (and -Query and -Fragment) are UTF-8 encoded strings. This is consistent with the CoAP model and with readable URIs.

But thinking at CoAP protocols, there are time and again places where it would have been great if we had binary paths. A-ReaLiSt (don't quote me on the capitalization) avoided putting things in path components purely because they couldn't count there. The OSCORE option could just as well have been a path segments with some squinting if it could be enumerated / counted data. So maybe on the path to a CoAP Internet Standard, we might loosen up a bit on the encoding requirements there. UTF-8 would still be the recommendation, but if someone wants to make their path %00%00%00%01, let them. (This would be consistent with the general trend to not work in terms of IRIs again, as the display convention of "if it's UTF-8, show it as that" works reasonably well in browser practice).

So let's consider not putting another nail in that door, and making our URI components binary. What would change?

Legal (by today's proposed standards) CoAP URIs would stay single component. Some single-component CRIs would not be expressible in CoAP, but that's already the case now (looking at the net-unicode discussion). And after all, it's the server that is the authority on what is a valid URI, so on a CoAP scheme these wouldn't be minted so far anyway.

Text-or-PET would become Binary-or-percent-encoded-subdelim, simplifying the rules dramatically in that what used-to-be PET is now percent-encoded-subdelim, and the valid range of that is easily characterized.

The two large downsides I see are:

But I like the upsides:

cabo commented 1 year ago

I had similar thoughts about a dozen times before and after CoAP was approved in 2013. My way to quell them was FETCH...

chrysn commented 1 year ago

Resolution: We leave things as-is.

If CoAP ever changes, we can introduce a CRI extension that opens up a / bstr in the same place as text-or-pet. Servers would use that when they announce their binary path segments (but probably stick with tstr for others, especially given they probably offer alternative longer URIs for clients that can't). Applications that use binary data would also use it that way then (provided they're happy with what it does to delimiters).