Closed twittner closed 11 years ago
We certainly should trim leading or trailing whitespace. However,
@gregorycollins I could have sworn that the code I imported into http-streams's src/Network/Http/Utilities.hs from snap-core's src/Snap/Internal/Parsing.hs and snap-server's src/Snap/Internal/Http/Parser.hs attended to dropping such extraneous whitespace. So before we fix it here in the call to insertWith
, perhaps we should double check the parser code?
@twittner I'll have to turn this into a test case, obviously. Thanks for the heads up.
AfC
I've got a test case on branch 'trailing-whitespace' that fails.
You probably don't want to cut trailing whitespace out of every header, because it generally isn't necessary and you pay extra for doing so. Just make sure that the function that parses the integer out of the content-length header tolerates the trailing whitespace.
You could use "decimal" from attoparsec instead of whatever you're using now.
@gregorycollins Hm. RFC 2616 §4.2 says (as we've all read and re-read a thousand times):
The field-content does not include any leading or trailing LWS: linear white space occurring before the first non-whitespace character of the field-value or after the last non-whitespace character of the field-value. Such leading or trailing LWS MAY be removed without changing the semantics of the field value.
Wouldn't it be appropriate for us to trim such linear whitespace regardless, especially seeing as how in a continuing header it has to be compressed to a single SP? If so, trimming it in the parser would be far better than another pass after the fact, yeah?
btw, the current readDecimal is what you put in, which was because you didn't like me using decimal from attoparsec or the one from bytestring-lexing :) I'm adding
x' = head $ S.words str'
before doing the fold to build the number. Presumably that'll be sufficient to guard this condition, but if we can change the parser to trim all fields I'll revert this.
AfC
Wouldn't it be appropriate for us to trim such linear whitespace regardless, especially seeing as how in a continuing header it has to be compressed to a single SP? If so, trimming it in the parser would be far better than another pass after the fact, yeah?
Maybe -- the only reason we don't inside snap-server is that searching for CR is a memchr()
call, which is a lot faster than doing takeWhile
or dropWhile
, and if the header is never inspected you're just doing extra work for no reason. The difference in performance may not be enough to justify that, however.
btw, the current readDecimal is what you put in, which was because you didn't like me using decimal from attoparsec or the one from bytestring-lexing :) I'm adding
x' = head $ S.words str'
before doing the fold to build the number. Presumably that'll be sufficient to guard this condition, but if we can change the parser to trim all fields I'll revert this.
Data.ByteString.Char8.spanEnd
might be faster and should allocate less.
@gregorycollins I gave it a try; as far as I could tell from +RTS -p
and -s
it didn't make any difference. I suppose I should do it anyway.
This test currently causes the error "' ' is not an ascii digit":
Here is the output from whireshark:
The stack trace looks as follows:
Reason is that the response header "Content-Length" contains trailing whitespace. According to RFC2616, section 4.2, "[...] leading or trailing LWS MAY be removed without changing the semantics of the field value"
So, when adding header values, should we maybe trim these?