Open chadwhitacre opened 11 years ago
Some websites do that. I wouldn't say that this is a huge issue.
So, two conflicting takes on this:
1) RFC 2396 says that URI path separator is a single slash. 2) POSIX path definition says that "Multiple successive slashes are considered to be the same as one slash".
Since we stand at an intersection of the two, I vote that we go with whatever's implemented, which at the moment seems to be the second option.
Interesting. A twist: someone registered the empty string as their username on Gittip. I would expect this to show me their profile:
Treating double slashes as a single slash in URLs is very common practice. I've seen it in a bunch of places (mostly from bad code requesting URLs with repeated slashes and never getting fixed because it didn't break anything). It appears to be the default behavior of both Apache and Nginx and I'm pretty sure I've seen this behavior from IIS (I can't think of a quick way I can check that right now).
Playing with URLs in my existing browser tabs, Google, DuckDuckGo and others ignore the repeated slashes, Github gives a 404.
Okay, so the issue I see with this is: what if there are files: /.spt and /index.html.spt , and someone hits / ? which do they get? is there an implied empty string after every / ? and it overrides the fallback paths? Is .spt a valid filename? (note that it will be a 'hidden' file under unix) This seems orthogonal to the issue of // vs /, but I think it's related enough that if we answer it, it might give us a clue how to answer the // vs / problem.
It appears that Flask also treats multiple successive slashes as a single slash, for what it's worth.
I think this is fine as-is. If you really want to differentiate you can make a wildcard sptfile.
Doesn't seem right to me. http://www.example.com// should be 404.
I want https://www.gittip.com// to match %username
with path['username']
set to ''
.
Because some schmoe changed their username on Gittip to the empty string, and I want to be like, "Sure! Go ahead!" :-)
what if there are files: /.spt and /index.html.spt , and someone hits / ? which do they get? is there an implied empty string after every / ? and it overrides the fallback paths? Is .spt a valid filename? (note that it will be a 'hidden' file under unix)
If the only way to 'catch' that kind of filename is with a wildcard, I think we shouldn't do it.
is there an implied empty string after every / ?
No, there's an actual empty string between every //. :-)
Treating // as something different than / goes against defacto standards on the web. I'd love to find an RFC that speaks to this. (I have not yet found one.)
Found this slightly-related reference while working on #195:
The "/" character may be used within HTTP to designate a hierarchical structure.
Also see "HIERARCHICAL FORMS" in http://www.ietf.org/rfc/rfc1630.txt:
The slash ("/", ASCII 2F hex) character is reserved for the
delimiting of substrings whose relationship is hierarchical. This
enables partial forms of the URI. Substrings consisting of single
or double dots ("." or "..") are similarly reserved.
The significance of the slash between two segments is that the
segment of the path to the left is more significant than the
segment of the path to the right. ("Significance" in this case
refers solely to closeness to the root of the hierarchical
structure and makes no value judgement!)
Note
The similarity to unix and other disk operating system filename
conventions should be taken as purely coincidental, and should
not be taken to indicate that URIs should be interpreted as
file names.
sigh write me some failing tests into an issue170 branch and I'll see about making the dispatcher work correctly.
...so if autoindex is on, should a request for //
give 404? or give the autoindex('//') -> autoindex('/') ? There's no possible way to make it give anything else without a wildcard simplate (%foo.spt
), as you can't make a directory with an empty-string name.
And what if there are files: /.spt
and /index.html.spt
, and someone hits /
? which do they get? Logically I think /.spt
would override the index.html.spt
since the latter is a 'fallback' and the former is 'more particular'.
Is .spt
a valid filename? (note that it will be a 'hidden' file under unix)
Also, since aspen mimics the filesystem mostly, I suspect people will be surprised when http://example.com/foo/bar
!= http://example.com/foo//bar
@whit537 ping. Design opinions needed.
Another example: https://gratipay.com/about//stats is currently 404.
Discussing on https://github.com/AspenWeb/salon/issues/8 (at about 40 minutes?) ... let's redirect //
to /
in an algorithm function.
After running
make doc
, I would expect http://localhost:5370// to give me a 404, but instead it gives me the homepage.