Open alandefreitas opened 2 years ago
all I can do is study IRIs more
More and more I've been noticing how this very true. I can't even evaluate if some of the comments on IRIs make sense. I can only answer them by studying IRIs.
I read a little more about IRIs and I don't think we should touch this until all else is stable.
Some changes we would need:
Design:
url_view
s would be allowed to additional unicode chars or we would need another iri_view
class;I have a question related to the use cases of IRIs. Are there any requirements in the HTTP related RFCs (9112 and complementary) that a server MUST support non-ascii request targets without percent encoding? e.g.: When using curl to send a request for a non-ascii resource the target is not percent encoded but an UTF-8 string:
`curl -x 10.22.65.19:7074 -vvv "http://10.17.13.30:8084/files/교회-요양원-모임고리로감염확산"
GET http://10.17.13.30:8084/files/교회-요양원-모임고리로감염확산 HTTP/1.1 Host: 10.17.13.30:8084 User-Agent: curl/7.81.0 Accept: / Proxy-Connection: Keep-Alive `
I don't think so. Requiring a server to accept some kind of URL is equivalent to saying the server must contain a given resource. If you have no resource to associate with 교회-요양원-모임고리로감염확산
, there's nothing to support here. Curl tends to accept loose inputs because it's a producer that can talk to any server. But I don't know if it's converting these inputs to something the server should understand or just passing the URL through. Everything usually tends to be fine when the input is converted to some proper percent escaped URL.
I think there are two opinions on this.
One supposed advantage of the first point of view is that we would have more strict parsers over time because servers wouldn't need to handle loose input bad producers are generating. So servers could eliminate workarounds over time.
The second point of view has the advantage that bad producers and consumers are iteratively kicked out of the ecosystem until everyone complies correctly with proper input and no ambiguous variations.
For Boost.URL, in my opinion, the second point of view seems more reasonable considering the use cases it serves, which is usually machine-to-machine communication. For instance,
If we consider a browser address bar as the most common use case, where it's user-to-machine communication, then Boost.URL would need to accept and sanitize a lot of invalid input. Now there's no exact protocol to follow because humans can fail in lots of different ways. But that's bad for machine-to-machine communication even when it works. If Boost.URL handled that use case by default, then servers could now be routing invalid URLs to resources without meaning to and clients could also be making requests to invalid URLs.
That doesn't mean Boost.URL can't help users produce valid input. For instance, that's what urls::format
is meant to do.
What are the use-cases for IRI?
From my point of view, all I can do is study IRIs more.