cmiles74 / bishop

A Webmachine-like library for Clojure
Other
95 stars 10 forks source link

How can I prevent the parameter charset from being added to the response header Content-Type? #9

Closed aviflax closed 12 years ago

aviflax commented 12 years ago

The library, or possibly some other part of the stack such as Ring or Jetty, is adding this parameter.

Example:

$ curl -i http://localhost:3000/
HTTP/1.1 200 OK
Date: Wed, 11 Jul 2012 20:04:33 GMT
Vary: accept
Content-Type: text/html;charset=ISO-8859-1
Content-Length: 12
Server: Jetty(7.6.1.v20120215)

hello, world

I’d prefer that the parameter charset not be present in the response header Content-Type. How can I remove or suppress it?

cmiles74 commented 12 years ago

That sounds like a bug, there's no place where character set is explicitly set to ISO-8859-1. Handler functions list UTF8 as the only encoding they provide by default; if the client browser isn't asking for anything specific, that should be what it gets. As a matter of fact, looking at this code, even if the browser asks for another encoding it should get a 406 ("Not Acceptable").

When I hit my service, this is what I'm seeing.

$ http -uv http://localhost:3000/collections
GET /collections HTTP/1.1
Host: localhost:3000
Content-Type: application/x-www-form-urlencoded; charset=utf-8
Accept-Encoding: identity, deflate, compress, gzip
Accept: */*
User-Agent: HTTPie/0.2.2

HTTP/1.1 200 OK
Date: Fri, 13 Jul 2012 14:46:49 GMT
Vary: accept-encoding, accept
Content-Type: application/json; charset=ISO-8859-1
Content-Length: 330
Server: Jetty(6.1.x)

{"_links":{"self":{"href":"/collections/?page=0&size=10"}},
"_meta":{"total":1,"size":10,"page":0},"collections":[{"_links":{"self":{"href":"/collections/webdocs"}},"id":"b032108c7680863668cf68b5304700ac",
"active":true,"modified_date":1342104352032,"created_date":1342104352032,
"name":"webdocs","description":"A test collection"}]}

It's getting set somewhere and that's just wrong. It doesn't look like Ring is doing this, it is checking the Content-Type header and it only sets the encoding if it's present. I believe Jetty is the one setting this encoding, ISO-8859-1 is it's default.

There is a default character set ISO-8859-1, which supports most western European languages, and is currently the official 'default' content encoding for content carried by HTTP (but not for data included in URLs - see below). This is also the default for all Jetty versions except 4.0.x. Alternative encodings may be specified by defining the HTTP Content-Type header of the page to be something like "text/html; charset=ISO-8859-4". From a Servlet you can use request.setContentType("text/html; charset=ISO-8859-4"); .

The logic right now is that if there's no Accept-Charset header provided by the client then nothing happens, the result should be Jetty's default of ISO-8859-1. If the client does specify and encoding, it's stored under the :acceptable-charset key of the request map; later on, when the body is added to the response it is ignored. Doh!

I will fix this ASAP. The other side of this question is whether or not the character set should be explicitly set when the client does not provide an Accept-Charset header. I think we'll be able to punt on this one; when the code that sets the Content-Type is fixed, the bit that figures out what the content type should be will settle on UTF-8 unless the :charsets-provided callback explicitly lists something else as supported and the client explicitly asks for it.

PS: The resource should set the default encoding to "UTF-8", not "UTF8"; I'll fix that as well.

cmiles74 commented 12 years ago

Okay, all set. This is fixed in the 1.0.9 release.

aviflax commented 12 years ago

Looks great, thanks!

aviflax commented 12 years ago

I just noticed a case wherein the default charset is still being added to my response: when one of my resource callbacks returns a response map.

My example:

     :malformed-request?
     (fn [req]
       (let [term (keywordize-keys (:params req))]
         (when (not (term-valid? term))
           {:headers {"Content-Type" "text/html"}
            :body (html-page req "The request representation is invalid.")})))
$ curl -i -d name=PropertyCity -d description=Foo http://localhost:3000/terms;echo
HTTP/1.1 400 Bad Request
Date: Tue, 17 Jul 2012 22:25:19 GMT
Content-Type: text/html;charset=ISO-8859-1
Content-Length: 1234
Server: Jetty(7.6.1.v20120215)
cmiles74 commented 12 years ago

I jut added code to fix this one this afternoon. The issue is that the halting response bails out of the decision tree before the content type is negotiated. I'll get that out tomorrow morning.

Thank you for all of the feedback, Avi! It is appreciated.

cmiles74 commented 12 years ago

Okay, release 1.1.1 fixes this issue. When we exit the decision tree, a content-type and character set are chosen according to either the response body (if one was provided) or we default to text/plain and the first encoding provided by the resource.

$ echo 'C is for Cookie!' | http -v POST http://localhost:3000/collections
POST /collections HTTP/1.1
Host: localhost:3000
Content-Type: application/json; charset=utf-8
Accept-Encoding: identity, deflate, compress, gzip
Accept: application/json
User-Agent: HTTPie/0.2.2

C is for Cookie!

HTTP/1.1 422 Unprocessable Entity
Date: Wed, 18 Jul 2012 03:18:55 GMT
Vary: accept-encoding, accept-charset, accept
Content-Type: text/plain; charset=utf-8
Content-Length: 183
Server: Jetty(6.1.x)

Unexpected character ('C' (code 67)): expected a valid value (number, String, array, 
object, 'true', 'false' or 'null')
 at [Source: java.io.StringReader@7c7ba849; line: 1, column: 2]

I also found and fixed a bug where the headers from a POST response that's creating a resource would be lost and it would fall-back to the first content-type provided by a resource. This would effect the returning of error messages when a POST was creating.

This should take care of your issue.

aviflax commented 12 years ago

My pleasure, thanks for being so responsive! I really appreciate it.

aviflax commented 12 years ago

Christopher, this might not yet be fully fixed. I’m now running 1.1.2, and I’m still seeing some cases where error responses are getting the charset ISO-8859-1.

Here’s what I’m seeing:

$ curl -i -X GET http://localhost:3000/terms/
HTTP/1.1 200 OK
Date: Fri, 20 Jul 2012 02:47:05 GMT
Vary: accept-charset, accept
Content-Type: text/html;charset=UTF-8
Content-Length: 1200
Server: Jetty(7.6.1.v20120215)
$ curl -i -X POST -d bad=request http://localhost:3000/terms/
HTTP/1.1 400 Bad Request
Date: Fri, 20 Jul 2012 02:47:49 GMT
Vary: accept-charset, accept
Content-Type: text/html;charset=ISO-8859-1
Content-Length: 1235
Server: Jetty(7.6.1.v20120215)

My resource is not complicated:


(def global-term-collection
  (bishop/resource
    {"text/html"
     (fn [req]
       {:body (html-page req)})

    "application/json"
     (fn [req]
       {:body (json/generate-string (get-global-term-list))})}

    {:allowed-methods (fn [req] [:get :head :post])

     :process-post
     (fn [req]
       (let [term (term-from-req req)]
         (cond
           ;; should be handled by :is-conflict? but Bishop doesn’t apply that handler to POST requests 
           (global-term-exists? (:name term))
           {:status 409
            :headers {"Content-Type" "text/html"}
            :body (html-page req "A Term with the specified name already exists.")}

           (not (term-valid? term))
           {:status 400
            :headers {"Content-Type" "text/html"}
            :body (html-page req "The request representation is invalid.")}

           :else
           (do
             (store-global-term term)
             {:status 201
              :headers {"Location" (str "/terms/" (:name term))}}))))}))
cmiles74 commented 12 years ago

Yikes, this one has been harder to get right than I expected. I will take a look at this later this morning.

cmiles74 commented 12 years ago

Alright, I found a typo that was causing problems when the :process-post? function returned a map with a status code. Once I fixed that, it was clear that if the response's :headers map contained a "Content-Type" as well as a "content-type", the results were weird. I altered the ensure-content-type function to remove the "Content-Type" key from the response's header map before setting "content-type". This should resolve the issue, however, it may cause other problems to come to light. Please let me know how things work with the 1.1.3 release.

aviflax commented 12 years ago

Fixed! Thanks!

aviflax commented 12 years ago

Wait, does that mean that when I set a response header, I should set the name as all lower case? The ring spec doesn’t say.

cmiles74 commented 12 years ago

Nope, I made this change so that you wouldn't have to do that. :) Mixed case or not, it now comes out right.