Open hanshuebner opened 12 years ago
I wrote this hack as a work-around. It seems to solve the problem. It should not break most applications, since keywords are usually already interned from the source, but I included an earmuff-override for when it does.
(in-package :chunga)
(defvar *intern-unsafely*)
(defun as-keyword (string &key (destructivep t))
(or (gethash string +string-to-keyword-hash+)
(find-symbol (string-upcase string) (find-package "KEYWORD"))
(if (and (boundp '*intern-unsafely*) *intern-unsafely*)
(make-keyword string destructivep)
string)))
I have pondered a similar approach - The problem that I see is that it kind of breaks the representation of headers in Hunchentoot. Unknown headers no longer have a keyword as they key in the headers alist, but a string. While this might not be a big problem, it also does look hackish to me.
I would like to see this problem solved by completely replacing the intern invocations by find-symbol, and change the representation of headers so that they always have a string as their key.
Yeah, changing the headers to always be strings would probably be best, but I imagine that will be rather a bigger project. Plus it will break existing software.
re headers, I read how @susam configures her Nginx with a limited set of headers:
I have now updated the Nginx configuration to block arbitrary headers coming
from the remote client and explicitly pass only a limited set of headers to Hunchentoot.
So now I have something like this in the Nginx reverse proxy configuration:
proxy_pass_request_headers off;
proxy_set_header Accept $http_accept;
proxy_set_header Content-Length $http_content_length;
proxy_set_header Content-Type $http_content_type;
proxy_set_header If-Modified-Since $http_if_modified_since;
proxy_set_header Referer $http_referer;
proxy_set_header User-Agent $http_user_agent;
proxy_set_header X-Forwarded-For $remote_addr;
I will push this configuration to the GitHub repository too sometime this weekend.
repository mentioned: Mathb. edit: https://github.com/susam/mathb/blob/main/etc/nginx/https.mathb.in
Yeah, changing the headers to always be strings would probably be best, but I imagine that will be rather a bigger project. Plus it will break existing software.
This could simply be an option in hunchentoot *headers-as-strings*
that's set to NIL
by default. If set to T
then the headers will be represented with strings. That way we avoid breaking any existing applications, and it should be relatively easy to implement.
What do you guys think? Are any of you maintainers? Would a PR for this be accepted?
Maybe we should break things. But only if there's a way to fail at compile-time.
Maybe add a white list of known headers and replace the interning with a lookup? Having access to normalized header names by default is very useful, and using symbols to represent headers is idiomatic.
Maybe add a white list of known headers and replace the interning with a lookup? Having access to normalized header names by default is very useful, and using symbols to represent headers is idiomatic.
And then treat non standard headers as strings? I for one use custom headers for things like jwt authentication, captcha tokens, and others...
Here's a list of known headers: https://developer.mozilla.org/en-US/docs/Web/HTTP/Headers
Accept
Accept-CH
Accept-Charset
Accept-Encoding
Accept-Language
Accept-Patch
Accept-Post
Accept-Ranges
Access-Control-Allow-Credentials
Access-Control-Allow-Headers
Access-Control-Allow-Methods
Access-Control-Allow-Origin
Access-Control-Expose-Headers
Access-Control-Max-Age
Access-Control-Request-Headers
Access-Control-Request-Method
Age
Allow
Alt-Svc
Alt-Used
Attribution-Reporting-Eligible
Attribution-Reporting-Register-Source
Attribution-Reporting-Register-Trigger
Authorization
Cache-Control
Clear-Site-Data
Connection
Content-Digest
Content-Disposition
Content-DPR
Content-Encoding
Content-Language
Content-Length
Content-Location
Content-Range
Content-Security-Policy
Content-Security-Policy-Report-Only
Content-Type
Cookie
Critical-CH
Cross-Origin-Embedder-Policy
Cross-Origin-Opener-Policy
Cross-Origin-Resource-Policy
Date
Device-Memory
Digest
DNT
Downlink
DPR
Early-Data
ECT
ETag
Expect
Expect-CT
Expires
Forwarded
From
Host
If-Match
If-Modified-Since
If-None-Match
If-Range
If-Unmodified-Since
Keep-Alive
Last-Modified
Link
Location
Max-Forwards
NEL
No-Vary-Search
Observe-Browsing-Topics
Origin
Origin-Agent-Cluster
Permissions-Policy
Pragma
Priority
Proxy-Authenticate
Proxy-Authorization
Range
Referer
Referrer-Policy
Refresh
Report-To
Reporting-Endpoints
Repr-Digest
Retry-After
RTT
Save-Data
Sec-Browsing-Topics
Sec-CH-Prefers-Color-Scheme
Sec-CH-Prefers-Reduced-Motion
Sec-CH-Prefers-Reduced-Transparency
Sec-CH-UA
Sec-CH-UA-Arch
Sec-CH-UA-Bitness
Sec-CH-UA-Full-Version
Sec-CH-UA-Full-Version-List
Sec-CH-UA-Mobile
Sec-CH-UA-Model
Sec-CH-UA-Platform
Sec-CH-UA-Platform-Version
Sec-Fetch-Dest
Sec-Fetch-Mode
Sec-Fetch-Site
Sec-Fetch-User
Sec-GPC
Sec-Purpose
Sec-WebSocket-Accept
Sec-WebSocket-Extensions
Sec-WebSocket-Key
Sec-WebSocket-Protocol
Sec-WebSocket-Version
Server
Server-Timing
Service-Worker-Navigation-Preload
Set-Cookie
Set-Login
SourceMap
Speculation-Rules
Strict-Transport-Security
Supports-Loading-Mode
TE
Timing-Allow-Origin
Tk
Trailer
Transfer-Encoding
Upgrade
Upgrade-Insecure-Requests
User-Agent
Vary
Via
Viewport-Width
Want-Content-Digest
Want-Digest
Want-Repr-Digest
Warning
Width
WWW-Authenticate
X-Content-Type-Options
X-DNS-Prefetch-Control
X-Forwarded-For
X-Forwarded-Host
X-Forwarded-Proto
X-Frame-Options
X-XSS-Protection
And then treat non standard headers as strings?
Not quite. The documentation suggests that keywords are used for header names when calling HEADER-IN
, so the lookup could actually just be FIND-SYMBOL
and then find custom headers because they're interned into the keyword package by the source code. Only headers that have no corresponding keyword would end up being treated as strings.
I just made this in case it's useful: https://github.com/daninus14/cl-http-headers
I was thinking more in the lines of changing STRING-AS-KEYWORD
here:
As it stands now, this interns any header name into the keyword package, whereas it should really perform a lookup.
To illustrate, consider this:
* (find-symbol "FOO" :keyword)
NIL
NIL
* :foo
:FOO
* (find-symbol "FOO" :keyword)
:FOO
:EXTERNAL
Once a keyword has been used (i.e. by writing it into the source code of your program), it will be interned into :KEYWORD
package and thus be found by FIND-SYMBOL
.
One will have to check whether STRING-AS-KEYWORD
is used in other places, and what a change in the behavior would mean to those other places. Given that interning strings received from an external source is always unsafe, those call sites should also be changed, if required. I'd also rename the function to LOOKUP-HEADER-SYMBOL
. One bit complication is the reader's casing rules which can be influenced by special variables (which is what the "reader defaults" comment refers to). A proper solution will have to adhere to those rules to appease folks who insist on using non-default reader configurations.
Ok, I'm not acquainted with the code base to follow, but it seems string-as-keyword is only used once. I couldn't find where the headers are being parsed into being keywords
STRING-AS-KEYWORD
is what turns a header into a keyword. It does so by using READ-FROM-STRING
instead of INTERN
because that way, if a programmer chooses to reconfigure their reader settings in a non-standard way, header case would be interned correctly.
Here's a brief rundown what this is about: By default, most (but not all) lisp implementation convert symbols that are read in the REPL or from source code to upper case. This is why when you type :foo
in the REPL, it echoes :FOO
. Now, for historic reasons, this behavior is configurable through the Readtable. For some reason, we have at some point in time decided that we want to preserve the case of incoming headers so that if the Readtable is configured to preserve case of symbols, headers would also be interned in mixed case. This is not a good assumption to make anyway, because headers are case insensitive in the http protocol and a server could legitimately send COnTENT-Length: 2345
and still mean the Content-Length
header.
For that reason, I think it would be best to replace the implementation of STRING-AS-KEYWORD
by a straight call to (find-symbol (string-upcase string) :keyword)
. The risk of breaking existing code would exist, but I think it would be a slim chance and result in a safer and more predictable behavior.
It looks like here is where the headers are parsed: (defun get-request-data (stream)
which calls this read-http-headers
here https://github.com/edicl/hunchentoot/blob/master/headers.lisp#L285C27-L285C42
That is from chunga, and here in line 182 it calls as-keyword
From here it looks like that's the only place where as-keyword
is used: https://github.com/search?q=repo%3Aedicl%2Fchunga%20as-keyword&type=code
So @hanshuebner what you suggested is basically what @sletner had originally done in terms of using keywords if they exist, just without interning them. So the actual solution is similar to what it was?
I just realized the code was exactly what @sletner wrote above. I just added it as a PR. The default behavior will not allow interning, but for whoever has an app that will break because of this, it allows it, per @sletner.
It probably makes sense to add this to the docs. Should I add a comment in the docs if you'll approve the PR?
This issue can now be closed since it was solved by https://github.com/edicl/chunga/pull/11
Hunchentoot uses CHUNGA's as-keyword method to convert strings to keywords. In several cases, the strings are coming from external sources (e.g. method in incoming requests), and by interning arbitrary strings, memory can be leaking.