Closed DavidOliver closed 4 years ago
Thank you, this is an AMAZING bug report -- thank you for distilling it down to narrow test cases that have a very specific difference. Will get on this soon, or PRs welcome.
@DavidOliver:
Sorry to slightly derail, but I have to add this. You can delete this if you want to.
At any rate, the Hiawatha webserver is still being actively maintained by Mr. Leisink, He ditched GitHub due to reasons of integrity (IIRC). As you all might know, M$ bought GitHub and this naturally rubbed some people the wrong way. So he is now gone from here.
Faithfully, -k0nsl
Hi @k0nsl,
Thanks for noting that. My comment on Hiawatha's development was technically incorrect, but was based on this rather than anything to do with Github.
[...] I will no long be available for support questions about the Hiawatha webserver. Security related issues can still be reported, of course.
The most important reason for this is that my spare time is only limited and I'd rather spend it doing other things than developing a webserver.
[...]
So, can you continue using the Hiawatha webserver? Well, that depends on what you want from a webserver. Clearly, Hiawatha will never support HTTP/2 or HTTP/3. If you're fine with that and Hiawatha serves your needs, you can continue using it. To be clear: I won't stop developing Hiawatha. But new features will be based on what I need, not on what is needed for a webserver in general.
There has indeed been a release since then, and a couple of my servers are still running Hiawatha just fine, but with an eye on the future it seems reasonable for me to switch to, e.g., Caddy for new projects/deployments.
By the way, I've been in contact on and off with Hugo over the years and he's always been extremely supportive and helpful, and I enjoyed being able to set up and configure a webserver so easily (barring some rewrite rule conversion head-scratching here and there!). I can quite understand his decision to shift his attention to other things. :)
@DavidOliver When you have:
try_files /test.php?p={path}&{query}
Do you intend to replace any existing query string, or augment it?
(Replacing is easy, augmenting is a little trickier...)
Hi @k0nsl,
Thanks for noting that. My comment on Hiawatha's development was technically incorrect, but was based on this rather than anything to do with Github.
[...] I will no long be available for support questions about the Hiawatha webserver. Security related issues can still be reported, of course. The most important reason for this is that my spare time is only limited and I'd rather spend it doing other things than developing a webserver. [...] So, can you continue using the Hiawatha webserver? Well, that depends on what you want from a webserver. Clearly, Hiawatha will never support HTTP/2 or HTTP/3. If you're fine with that and Hiawatha serves your needs, you can continue using it. To be clear: I won't stop developing Hiawatha. But new features will be based on what I need, not on what is needed for a webserver in general.
There has indeed been a release since then, and a couple of my servers are still running Hiawatha just fine, but with an eye on the future it seems reasonable for me to switch to, e.g., Caddy for new projects/deployments.
By the way, I've been in contact on and off with Hugo over the years and he's always been extremely supportive and helpful, and I enjoyed being able to set up and configure a webserver so easily (barring some rewrite rule conversion head-scratching here and there!). I can quite understand his decision to shift his attention to other things. :)
Yes, Mr. Leisink is a great person. I hope everything goes well for him.
Greetings from Scandinavia.
Faithfully, -k0nsl
@mholt Augment, if I understand the question correctly.
It's the typical PHP URL rewriting situation, where all non-static-asset requests are rewritten to /index.php
with both the path and URL query parameters rewritten to the URL query parameters, so that PHP can be used for URLs without /index.php
in them.
Original | Rewritten |
---|---|
/blog |
/index.php?p=blog |
/blog?page=2 |
/index.php?p=blog&page=2 |
In Caddy 1, I have:
rewrite {
to {path} {path}/ /index.php?p={path}&{query} /index.php?{query}
}
(I think the /index.php?{query}
on the end is redundant.)
This is the Apache rewrite (Craft CMS guide page):
<IfModule mod_rewrite.c>
RewriteEngine On
# Send would-be 404 requests to Craft
RewriteCond %{REQUEST_FILENAME} !-f
RewriteCond %{REQUEST_FILENAME} !-d
RewriteCond %{REQUEST_URI} !^/(favicon\.ico|apple-touch-icon.*\.png)$ [NC]
RewriteRule (.+) index.php?p=$1 [QSA,L]
</IfModule>
@DavidOliver Thanks, that's helpful. I just pushed an improvement to the rewrite
handler (see linked commit just above) so that it can now either augment or replace query strings. Next I'll fix the try_files directive to take advantage of this.
Should be fixed now, in 5e9d81b. Please try it out when you have a chance!
Thanks. Unfortunately, I don't think it's there yet.
My suggested test:
Install PHP 7. In some-dir/test.php
:
<?php phpinfo(32); ?>
Config:
http://example.localhost
root /some-dir
fastcgi / /run/php/php7.3-fpm.sock php
rewrite {
to {path} {path}/ /test.php?p={path}&{query}
}
http://example.localhost/blog?page=2
GET request results:
Variable | Value |
---|---|
$_GET['p'] | /blog |
$_GET['page'] | 2 |
$_SERVER['SCRIPT_NAME'] | /test.php |
$_SERVER['QUERY_STRING'] | p=/blog&page=2 |
$_SERVER['REQUEST_URI'] | /blog?page=2 |
$_SERVER['DOCUMENT_URI'] | /test.php |
http://example.localhost/test.php
GET request results (here we request a PHP file directly):
Variable | Value |
---|---|
$_GET['page'] | 2 |
$_SERVER['REQUEST_URI'] | /test.php?page=2 |
$_SERVER['QUERY_STRING'] | page=2 |
Commit b1a456cfe38e83eedb778032ed6739e7e3cbe8a4
Config, mostly following the default PHP FastCGI setup from the v2 documentation, but simplified and modified to rewrite to p
URL query parameter:
http://example.localhost
root * /some-dir
try_files {path} /test.php?p={path}&{query}
matcher phpFiles {
path *.php
}
reverse_proxy match:phpFiles unix//run/php/php7.3-fpm.sock {
transport fastcgi {
split .php
}
}
http://example.localhost/blog?page=2
GET request results:
A blank 200
response. I suspect the request is not being sent to PHP.
http://example.localhost/test.php?page=2
GET request results (here we request a PHP file directly):
Variable | Value |
---|---|
$_GET['p'] | /test.php |
$_GET['page'] | 2 |
$_SERVER['REQUEST_URI'] | /test.php?page=2 |
$_SERVER['QUERY_STRING'] | page=2&p=/test.php&?page=2 |
Observations of requesting test.php
directly with v2:
$_GET['p']
variable (unexpected due to try_files {path}
)$_SERVER['QUERY_STRING']
is not rightWhen I change test.php
in the config to index.php
(meaning Craft CMS rather than my test file is processing) and request /index.php?x=1
, Craft's 404 Not Found error page includes:
$_GET = [
'x' => '1',
'p' => '/index.php',
'?x' => '1',
];
Many thanks.
Edit: corrected one of the variable value results for Caddy 2.
In case it's helpful, to recap requirements, which will apply in a similar way to many PHP sites:
*.php
, serve it as a static file.*.php
, serve PHP-processed with URL query params intact.This was very simple in v1. I hope I'm not missing something with regards to v2! :)
@DavidOliver Thanks for the details.
Can you make a change in your local repo, here: https://github.com/caddyserver/caddy/blob/b1a456cfe38e83eedb778032ed6739e7e3cbe8a4/modules/caddyhttp/fileserver/caddyfile.go#L130
Right in there, can you add a line:
Rehandle: true,
And let me know if the first case (/blog
) improves?
I think I know why the second one is going wonky and I'm still thinking on a good solution for that.
Thanks. With that and a request for /blog
I get:
ERROR http.log.error too many rehandles
I have:
makeRoute := func(try []string, writeURIAppend string) []httpcaddyfile.ConfigValue {
handler := rewrite.Rewrite{
URI: "{http.matchers.file.relative}{http.request.uri.query_string}" + writeURIAppend,
Rehandle: true,
}
I'm off to bed now but should be around tomorrow in case you want me to try anything else then.
Alright, that confirms my theory. I'll look into it more closely tomorrow. Thank you!
@DavidOliver I am now thinking, shouldn't the p
query string parameter be URL-encoded? i.e. %2Ftest.php
, not /test.php
. That seems more correct, but would that break your PHP app?
I'm not sure whether or not webservers do or are supposed to URL-encode in this situation. Whenever dealing with rewrite rules in the past (Apache, Hiawatha, Caddy 1) it hasn't been something that I've considered/had to deal with. Maybe worth looking at what others do with FastCGI/PHP?
@DavidOliver Well, it's complicated. Some do it by default, others don't. Some make it configurable, others use different variables for URL-encoding.
I've pushed https://github.com/caddyserver/caddy/commit/724c72867835f9287b3df3ee5a0d75327c0780cf in an attempt to make some progress here, can you give it a try?
In this change, I've decided to properly URL-encode query string values, unlike before. (Edit: And you're right, rewriting to /index.php?{query}
is redundant, since Caddy will preserve any existing query string by default; it will only modify a query string if you use ?
explicitly.)
One caveat about this change is that it overwrites existing query string values, and it sorts the resulting query string by key. There are like 10+ ways to manipulate the query string and trying to find ways to expose a way to express as many of them as possible is tricky.
and it sorts the resulting query string by key
Please be sure to document that, could surprise some people. In particular, I've encountered a lot of legacy APIs lately that sign requests by concatenating together all the values in the query, or by signing the entire query string as-is. I doubt it'll actually affect me personally, but just wanted to point that out.
FWIW, I don't love the current solution -- but if this general idea (minus the sorting of keys) ends up being the right answer, then I'll probably just copy+paste the function from the Go standard library and bring it directly into Caddy, just to remove that Sort()
line, I agree it's annoying that it does that when it really doesn't have to.
Still, would like some people to test it out so far and see if we're making progress. :)
Hi @mholt,
Testing with 724c728.
try_files {path} /test.php?p={path}&{query}
Note: you may wish to skip to the next try_files
heading.
http://example.localhost
root * /path/to/web
try_files {path} /test.php?p={path}&{query}
matcher phpFiles {
path *.php
}
reverse_proxy match:phpFiles unix//run/php/php7.3-fpm.sock {
transport fastcgi {
split .php
}
}
file_server
GET http://example.localhost
Variable | Value |
---|---|
$_GET['?'] | no value |
$_GET['??'] | no value |
$_GET['p'] | / |
$_SERVER['QUERY_STRING'] | %3F=&%3F%3F=&p=%2F |
$_SERVER['QUERY_STRING'] (decoded) | ?=&??=&p=/ |
GET http://example.localhost/blog
Variable | Value |
---|---|
$_GET['?'] | no value |
$_GET['??'] | no value |
$_GET['p'] | /blog |
$_SERVER['QUERY_STRING'] | %3F=&%3F%3F=&p=%2Fblog |
$_SERVER['QUERY_STRING'] (decoded) | ?=&??=&p=/blog |
GET http://example.localhost/blog?page=1
ERROR http.log.error too many rehandles
So Caddy is preserving/automatically appending the URL query params, so trying without the &{query}
...
try_files {path} /test.php?p={path}
http://example.localhost
root * /path/to/web
try_files {path} /test.php?p={path}
matcher phpFiles {
path *.php
}
reverse_proxy match:phpFiles unix//run/php/php7.3-fpm.sock {
transport fastcgi {
split .php
}
}
file_server
GET http://example.localhost
Variable | Value |
---|---|
$_GET['p'] | / |
$_SERVER['QUERY_STRING'] | p=%2F |
$_SERVER['QUERY_STRING'] (decoded) | p=/ |
GET http://example.localhost/blog
Variable | Value |
---|---|
$_GET['p'] | /blog |
$_SERVER['QUERY_STRING'] | p=%2Fblog |
$_SERVER['QUERY_STRING'] (decoded) | p=/blog |
GET http://example.localhost/blog?page=1
Variable | Value |
---|---|
$_GET['p'] | /blog |
$_GET['page'] | 1 |
$_SERVER['QUERY_STRING'] | p=%2Fblog&page=1 |
$_SERVER['QUERY_STRING'] (decoded) | p=/blog&page=1 |
So it's looking good without the &{query}
in the try_files
. Without it, when I use 'index.php' instead of 'test.php' (to load Craft CMS instead of the PHP variables test page) and request pages like '/blog', the CMS does indeed serve the correct HTML.
However, the static asset requests are resulting in PHP app-served 404s. On this 404 debug page:
$_GET = [
'p' => '/index.php',
];
How can I have requests like /img/twitter.svg
served directly (if such a file exists), and not rewritten to /index.php
?
Thanks!
Related, I think it would be good to get a few PHP users to test their URL rewrites once you think the solution is in place as different PHP apps/frameworks have different rewriting schemes.
Maybe this is related to my comment a few weeks ago, where rainloop webmail breaks with Caddy v2 and php 7.3, but works fine with v1 and php 7.3.
https://caddy.community/t/v2-rainloop-breaks-with-caddy-v2/6219
Now that #2959 and #2967 are done, the stage is set to finally fix this. I will be working on this next.
Hi @mholt. Great. :)
I just wanted to clarify that my testing in my response above seemed to show that PHP pages are successfully served as required when I don't include &{query}
in the target. I don't know whether or not you have more changes in mind, and I haven't yet properly read the two issues to which you linked above, but I thought I should re-state that I can get it working here as there's a lot of noise in my post.
The only thing I didn't get working was the serving of the static assets, as requests for those were also being sent to PHP.
@DavidOliver Great, thanks for the clarification. If you look at the latest commits, these in particular:
you'll see that I am actively working on improving rewrites and query string handling. I had to write a custom query string builder. :(
Next I will turn my attention to try_files
and php_fastcgi
directives to make sure they integrate with these improvements properly. I'll ping you here in this issue when I'm ready for you to test again!
@DavidOliver Thank you for your sponsorship! 💚
Okay, I've finished the planned maintenance on the rewrite handler, the try_files directive, and php_fastcgi.
The rewrite handler now only modifies parts of the URI that are specified in the rewrite. So if a path (like a filename) is specified, it will change the path portion of the URI. If a query string is specified, it will replace the existing query string.
This same logic passes thru to the try_files directive. Omitting a query string will implicitly preserve it, but specifying one will overwrite any existing one. (The reason for this is so that you can clear an existing query string by specifying an empty one with ?
). Thus, you can both preserve it and append to it by using ?{query}&a=b...
in your Caddyfile, for example. Doing this only appends your new key-values, it does not replace existing ones. The reason for this is mostly simplicity but also has a performance benefit as we don't need to decode and keep a map of the existing query string. Not a big deal, but seemed reasonable.
Can you pull down the latest on the v2 branch and give it a try when you have a chance?
I'm still finishing up some final revisions to the Caddyfile in general (related to how directives are ordered and handlers chained together), but I hope most of the work regarding rewriting is finished.
@mholt, that all sounds great!
I've compiled from commit 271b5af14894a8cca5fc6aa6f1c17823a1fb5ff3.
Based on your comment above, here's what I've got in my Caddyfile now (please let me know if this may need correcting in any way):
http://example.localhost
root * /path/to/web
try_files {path} /index.php?{query}&p={path}
php_fastcgi unix//run/php/php7.3-fpm.sock
file_server
However, this is resulting in a PHP-served 404 response because the 'p' parameter is set to the rewritten path of '/index.php' rather than the original path of '/blog'. Perhaps this will be fixed by the ordering work you mention? Or is there an "original path" variable that should instead be used in the rewrite/try_files directive?
I also don't yet know how to get static files served directly: for example, with the above config, requests for '/favicon.ico' are going to PHP.
Thanks.
@DavidOliver D'oh 🤦♂ Somehow, I omitted one or two test cases that directly relate to your situation. I've just pushed a fix for the p
parameter. The problem is that we were updating the path first, then building the query string which relied on the path, so it was only seeing the updated path. That's fixed now: updating the URI is done in isolation only after all the components have been built.
The favicon issue I'm still looking at... that is strange.
@DavidOliver Does the favicon file actually exist?
(Also, when you have a chance, feel free to verify that the 'p' parameter is fixed for you.)
It might be helpful at this point to run caddy adapt
with your Caddyfile and take a look at the routes and see why favicon is being served up by the PHP handler. You can always inject a handler like this:
{
"handler": "static_response",
"body": "{http.request.uri}\n"
}
and that will tell you what the URI is at that point in the chain.
@mholt, that's fixed it; my PHP CMS is now outputting the right HTML. :smiley_cat:
Yes, the static assets definitely exist on disk, and none of them are loaded as the requests for them are going to PHP. Here's the output of the adapt command:
{
"apps": {
"http": {
"servers": {
"srv0": {
"listen": [
":80"
],
"routes": [
{
"match": [
{
"host": [
"example.localhost"
]
}
],
"handle": [
{
"handler": "subroute",
"routes": [
{
"handle": [
{
"handler": "vars",
"root": "/m/projects/example/project/web"
}
]
},
{
"handle": [
{
"handler": "rewrite",
"uri": "{http.matchers.file.relative}"
}
],
"match": [
{
"file": {
"try_files": [
"{http.request.uri.path}"
]
}
}
]
},
{
"handle": [
{
"handler": "rewrite",
"uri": "{http.matchers.file.relative}?{http.request.uri.query}\u0026p={http.request.uri.path}"
}
],
"match": [
{
"file": {
"try_files": [
"/index.php"
]
}
}
]
},
{
"handle": [
{
"handler": "static_response",
"headers": {
"Location": [
"{http.request.uri.path}/"
]
},
"status_code": 308
}
],
"match": [
{
"file": {
"try_files": [
"{http.request.uri.path}/index.php"
]
},
"not": {
"path": [
"*/"
]
}
}
]
},
{
"handle": [
{
"handler": "rewrite",
"uri": "{http.matchers.file.relative}"
}
],
"match": [
{
"file": {
"try_files": [
"{http.request.uri.path}",
"{http.request.uri.path}/index.php",
"index.php"
]
}
}
]
},
{
"handle": [
{
"handler": "reverse_proxy",
"transport": {
"protocol": "fastcgi",
"split_path": ".php"
},
"upstreams": [
{
"dial": "unix//run/php/php7.3-fpm.sock"
}
]
}
],
"match": [
{
"path": [
"*.php"
]
}
]
},
{
"handle": [
{
"handler": "file_server",
"hide": [
"Caddyfile2"
]
}
]
}
]
}
],
"terminal": true
}
]
}
}
}
}
}
@DavidOliver Thanks, can you put this route:
{
"handle": [
{
"handler": "static_response",
"body": "{http.request.uri}\n"
}
]
},
right before the reverse_proxy route in your JSON, and then use the modified JSON as the config?
I would do it myself but I don't have the same site setup/folder so I can't as easily reproduce the problem.
You should see (especially via requests with curl
) what the rewritten URI is.
OR
Enable debug logging and see what the requests are being rewritten to. Set this to "DEBUG" for a log called "default": https://caddyserver.com/docs/json/logging/logs/level/
OR
do both :)
Thanks!
With that in the config before "reverse_proxy", curl outputs /index.php?p=%2Ffavicon.ico
for a request for http://example.localhost/favicon.ico
web$ ls -lah fav*
-rw-rw-r--+ 1 david david 1.2K Oct 28 20:02 favicon.ico
As I'm doing a custom rewrite rule for *.php files, should I be using reverse_proxy
with options rather than php_fastcgi
? If so, is the v2 wiki still up to date to refer to for the options and php_fastcgi
's defaults?
Okay - this is what's going on:
{
"handle": [
{
"handler": "rewrite",
"uri": "{http.matchers.file.relative}?{http.request.uri.query}\u0026p={http.request.uri.path}"
}
],
"match": [
{
"file": {
"try_files": [
"/index.php"
]
}
}
]
},
This rewrite handler is what's mucking things up. Since it only does a try_files
for /index.php
, this matcher will always be hit, because you always have a /index.php
file available. Basically, the matcher there instead needs to be written such that it will not match if a static file already exists on disk.
It think this would look something like this (please correct me if I'm wrong here @mholt)
@notstatic {
not {
try_files {path}
}
}
try_files @notstatic /index.php?{query}&p={path}
This is using the new matcher syntax that just hit the v2 branch a few days ago, so pull the latest to try that I guess!
Hmm, indeed, but I think the try_files
directive should do the negation automatically. That would make more sense... Lemme see what I can come up with.
@DavidOliver Thanks to the tip by @francislavoie, the solution became fairly obvious (and easy). I've just pushed it to 2466ed1.
One more try, would you please? :) Crossing my fingers that we've finally got it...
(to clarify, with that patch, you shouldn't need to change your Caddyfile like I suggested)
Yep, static assets are now being served. Many thanks! I'm looking forward to getting some PHP projects behind Caddy 2 in future. :slightly_smiling_face:
In case it's helpful for writing docs or anyone else wanting to use v2 for PHP apps, the following alone may be enough to have *.php
files processed by PHP and other, non-*.php
files served directly:
https://example.com
root * /path/to/public_html
php_fastcgi unix//run/php/php7.3-fpm.sock
file_server
If the PHP software requires particular URL rewrites, something like the following (which is for Craft CMS) can be added:
try_files {path} /index.php?{query}&p={path}
Yay! Thank you for the thorough follow-up! I'm counting this as a green light to cut beta 13 as soon as I get some docs written. Thank you for your patience and your invaluable debugging help here.
1. Which version of Caddy are you using?
v2.0 beta 9
2. What are you trying to do?
As per forum thread.
@mholt:
Ultimately I'm trying to use FastCGI for PHP 7.3, with Craft CMS 3, which, typically for PHP sites, requires the path to be passed to /index.php as a URL query paramter (
p
in this case).I'm using
/test.php
, which shows PHP request variables, to simplify the test:3. What is your Caddyfile?
4. How did you run Caddy?
Ubuntu 18.04.
5. Please paste any relevant HTTP request(s) here.
6. What did you expect to see?
The PHP Variables page.
7. What did you see instead (give full error messages and/or log)?
Caddy's 404 status response: 'File not found.'
8. Why is this a bug, and how do you think this should be fixed?
PHP is still a thing.
9. What are you doing to work around the problem in the meantime?
Using Caddy v1.
10. Please link to any related issues, pull requests, and/or discussion.
https://caddy.community/t/url-rewriting-for-craft-cms-3/6485/2
Bonus: What do you use Caddy for? Why did you choose Caddy?
I like that Caddy has a sane configuration syntax, handles obtaining SSL certs automatically and doesn't use OpenSSL for TLS - like Hiawatha which is no longer actively developed.
Thanks!