' ' is not an ascii digit

twittner commented 11 years ago

This test currently causes the error "' ' is not an ascii digit":

{-# LANGUAGE OverloadedStrings #-}
import System.IO.Streams
import Network.Http.Client

main :: IO ()
main = get "http://www.google.com" $ \_ i -> connect i stdout

Here is the output from whireshark:

GET / HTTP/1.1
Host: www.google.com
User-Agent: http-streams/0.6.0.1
Accept-Encoding: gzip
Accept: */*

HTTP/1.1 302 Moved Temporarily
Content-Length: 189      
Content-Encoding: gzip
Location: http://www.google.de/
Cache-Control: private
...

The stack trace looks as follows:

...
</html>*** Exception (reporting due to +RTS -xc): (THUNK_2_0), stack trace: 
  Network.Http.ResponseParser.readResponseHeader,
  called from Network.Http.Connection.receiveResponse,
  called from Network.Http.Inconvenience.getN,
  called from Network.Http.Inconvenience.get,
  called from Main.main,
  called from Main.CAF
  ...
test: ' ' is not an ascii digit

Reason is that the response header "Content-Length" contains trailing whitespace. According to RFC2616, section 4.2, "[...] leading or trailing LWS MAY be removed without changing the semantics of the field value"

So, when adding header values, should we maybe trim these?

diff --git a/src/Network/Http/Types.hs b/src/Network/Http/Types.hs
index 509bcfd..de0c557 100644
--- a/src/Network/Http/Types.hs
+++ b/src/Network/Http/Types.hs
@@ -370,9 +370,11 @@ addHeader
     -> (ByteString,ByteString)
     -> HashMap (CI ByteString) ByteString
 addHeader m (k,v) =
-    insertWith f (mk k) v m
+    insertWith f (mk k) (trim v) m
   where
     f new old = S.concat [old, ",", new]
+    trim      = S.dropWhile lws . fst . S.spanEnd lws
+    lws       = flip elem [' ', '\t', '\n', '\r']

 lookupHeader :: Headers -> ByteString -> Maybe ByteString
 lookupHeader x k =

istathar commented 11 years ago

We certainly should trim leading or trailing whitespace. However,

@gregorycollins I could have sworn that the code I imported into http-streams's src/Network/Http/Utilities.hs from snap-core's src/Snap/Internal/Parsing.hs and snap-server's src/Snap/Internal/Http/Parser.hs attended to dropping such extraneous whitespace. So before we fix it here in the call to insertWith, perhaps we should double check the parser code?

@twittner I'll have to turn this into a test case, obviously. Thanks for the heads up.

AfC

istathar commented 11 years ago

I've got a test case on branch 'trailing-whitespace' that fails.

gregorycollins commented 11 years ago

You probably don't want to cut trailing whitespace out of every header, because it generally isn't necessary and you pay extra for doing so. Just make sure that the function that parses the integer out of the content-length header tolerates the trailing whitespace.

You could use "decimal" from attoparsec instead of whatever you're using now.

istathar commented 11 years ago

@gregorycollins Hm. RFC 2616 §4.2 says (as we've all read and re-read a thousand times):

The field-content does not include any leading or trailing LWS: linear white space occurring before the first non-whitespace character of the field-value or after the last non-whitespace character of the field-value. Such leading or trailing LWS MAY be removed without changing the semantics of the field value.

Wouldn't it be appropriate for us to trim such linear whitespace regardless, especially seeing as how in a continuing header it has to be compressed to a single SP? If so, trimming it in the parser would be far better than another pass after the fact, yeah?

btw, the current readDecimal is what you put in, which was because you didn't like me using decimal from attoparsec or the one from bytestring-lexing :) I'm adding

x' = head $ S.words str'

before doing the fold to build the number. Presumably that'll be sufficient to guard this condition, but if we can change the parser to trim all fields I'll revert this.

AfC

gregorycollins commented 11 years ago

Wouldn't it be appropriate for us to trim such linear whitespace regardless, especially seeing as how in a continuing header it has to be compressed to a single SP? If so, trimming it in the parser would be far better than another pass after the fact, yeah?

Maybe -- the only reason we don't inside snap-server is that searching for CR is a memchr() call, which is a lot faster than doing takeWhile or dropWhile, and if the header is never inspected you're just doing extra work for no reason. The difference in performance may not be enough to justify that, however.

btw, the current readDecimal is what you put in, which was because you didn't like me using decimal from attoparsec or the one from bytestring-lexing :) I'm adding
x' = head $ S.words str'
before doing the fold to build the number. Presumably that'll be sufficient to guard this condition, but if we can change the parser to trim all fields I'll revert this.

Data.ByteString.Char8.spanEnd might be faster and should allocate less.

istathar commented 11 years ago

@gregorycollins I gave it a try; as far as I could tell from +RTS -p and -s it didn't make any difference. I suppose I should do it anyway.

aesiniath / http-streams

' ' is not an ascii digit #28