getgrav / grav

Modern, Crazy Fast, Ridiculously Easy and Amazingly Powerful Flat-File CMS powered by PHP, Markdown, Twig, and Symfony
https://getgrav.org
MIT License
14.57k stars 1.41k forks source link

Regex-based routes in site.yaml match but cause 502 error #2080

Open hughbris opened 6 years ago

hughbris commented 6 years ago

I am experiencing almost the same error as described by another person recently on Grav Forums.

My relevant site.yaml is:

routes:
  /blog/category/(.*): '/blog?category=$1'

I see "502 Bad Gateway" from nginx and the logs show:

2018/06/30 21:08:42 [error] 1321#0: *1727 recv() failed (104: Connection reset by peer) while reading response header from upstream, client: 127.0.0.1, server: grav.dev, request: "GET /blog/category/blog HTTP/1.1", upstream: "fastcgi://unix:/var/run/php5-fpm.sock:", host: "grav.dev", referrer: "http://grav.dev/blog/random-thoughts"

I have also tried with a slash after "/blog" before the query. Also tried the regex in single quotes.

This is exactly as documented in Regex based aliases in the docs.

I know the regex is matching because when I tweak it incorrectly I see 404s.

rhukster commented 6 years ago

502 errors are usually related to Nginx -> PHP-FPM communication. Unfortunately i don't use Nginx day-to-day so am no expert at it. Maybe this will help?

https://www.datadoghq.com/blog/nginx-502-bad-gateway-errors-php-fpm/

masetto commented 6 years ago

I tried with PHP 5.6 + Apache and PHP 7.2 +Nginx and I have the same result: on Apache

Whoops \ Exception \ ErrorException (E_ERROR)
Maximum execution time of 90 seconds exceeded

on: Nginx "502 Bad Gateway".

Basically I wait for several seconds and then get this message.

Can you verify Regex-Based aliases on latest version of Grav please?

rhukster commented 6 years ago

Can you reach the before and after URLs directly? its only the 'redirect' that causes issues?

masetto commented 6 years ago

I can reach after URL directly (/blog?category=blablabla in the above example), not the before URL (/blog/category/blablabla).

rhukster commented 6 years ago

If you create a physical page in Grav with route at that URl /blog/category/blablabla can you reach it?

masetto commented 6 years ago

Yes, sure.

hughbris commented 6 years ago

More details from me:

I'm going to install one of those blog skeletons where they use regex-based routing to see if they work out of the box.

hughbris commented 6 years ago

The TLDR of this is that I have repeatable steps. However, please try taking in the confusing detail :)

I tested the official Quark-themed blog skeleton demo, which contains a dummy route:

  /another/one/(.*): '/blog/$1'

and you can see it works: https://demo.getgrav.org/blog-skeleton/another/one/hero-classes

I also tried that locally and it works, so the regex expression seems unlikely as a culprit.

But you can also witness that the tags with URL "params" do work: https://demo.getgrav.org/blog-skeleton/tag:photography

There is no route configured in the skeleton for that path, so I surmised that a plugin I don't know like taxonomylist or simplesearch had defined it. I tried disabling these two plugins locally and still got the Nginx 502 gateway error.

For accurate comparison across environments, I downloaded and installed the blog skeleton.

/another/test/hero-classes worked unsurprisingly, as did the /tag:photography path (so whatever else is routing this works on my local env). However, when I added a query in the redirect, I got the 502 error. So this is replicatable:

Doesn't clarify much for me, but I hope this helps a smarter person :)

rhukster commented 6 years ago

Oh! This is just a matter of escaping the special chars in the regex with backslashes.

This string in the config is used directly as a regex so it has to be a valid regex.

Pretty sure this came up before.

hughbris commented 6 years ago

Cool! Seems like we're close.

Can you help me with an example? On the right hand side, what am I escaping and for which parser? I've tried backslashing the slash, query, equals, and dollar.

masetto commented 6 years ago

Please help me how to escape this regex: /yetanother/test/(.*): '/blog?category=$1' I tried '/blog\?category=$1 with no results. Thank you.

hughbris commented 6 years ago

I finally tracked down the regex routing code in the source at around line 500 of Pages.php and did some logging, expecting this might be easy to figure out from there.

Just before $page = $this->dispatch($found, $all);, I inserted a line:

$this->grav['log']->info("\$pattern: $pattern; \$source_url: $source_url; \$found: $found");

I also inserted a line:

$this->grav['log']->info("dispatching: $route");

just after the first line of the method, around line 444.

My route regex is: /blog/category/(.*): '/blog?category=$1'

and I tried accessing '/blog/category/x'.

The log showed repeated lines of:

[2018-07-08 13:43:49] grav.INFO: dispatching: /blog?category=x [] []
[2018-07-08 13:43:49] grav.INFO: $pattern: #^\/blog\/category\/(.*)#; $source_url: /blog/category/x; $found: /blog?category=x [] []

So it's almost certainly looping.

$source_url seems to "reset" on every invocation to the requested URL path around line 474, so $found is not being considered on the recursive call when it's $route. This is where I get lost as to what it's trying to do. I've edited a few of those parameters but the logic has not become apparent.

BTW in the grav logs when I enter the "target" URL '/blog?category=x' directly, I see:

[2018-07-08 14:15:20] grav.INFO: dispatching: /blog [] []
[2018-07-08 14:15:20] grav.INFO: dispatching: /blog [] []

which is not surprising.

So not as revealing as I hoped but hopefully helpful to someone who knows this code better.

valtzu commented 6 years ago

Hello,

I'm pretty sure I'm having a similar issue. Route defined in site.yaml:

/abc/123: /123

When trying to reach /abc/123/?foo=bar, Grav goes into infinite dispatch loop. Here's the dispatched routes in order:

/abc/123
/123
/abc/123/
/123/?foo=bar
/123/?foo=bar
/123/?foo=bar
/123/?foo=bar
/123/?foo=bar
// looping this until out of memory or execution time

The reason seems to be the one that @hughbris mentioned: when making the comparison of source and target uris on line 501, Grav is always using original uri instead of the one that was passed to dispatch method, and therefore there is a possibility to call the dispatch method with the same $route that was used when invoking the dispatch in the first place.

I'm not sure if the root cause of my problem is the same as yours (as this may be related to trailing slash logic), however, the end result is the same and there definitely is some sort of bug.

Karhal commented 4 years ago

Hello, I'm experiencing the same issue here. Just to keep the thread up.

As described in https://github.com/getgrav/grav/issues/2080#issuecomment-402006855 the same route is dispatched every time a query parameter is present in the replacement string. Maybe the router needs some more logic especially for this kind of route.

My need was to plug an api client to inject custom content into some pages from the api. This content depends on the called page. I ended up by writing my own rule from a new plugin.

resources:
    "/foo/(?!(:*\?))([a-z0-9]+(?:-[a-z0-9]+)*)":
      page: '/foo/bar'
      endpoint : '/news/{uuid}'
public function onPluginsInitialized()
    {
        if ($this->isAdmin()) {
            return;
        }

        /** @var Uri $uri */
        $uri = $this->grav['uri'];
        $resources = $this->config->get('plugins.api-client.resources');

        foreach ($resources as $pattern => $config) {
            $pattern = '#^' . str_replace('/', '\/', ltrim($pattern, '^')) . '#';
            preg_match($pattern, $uri->path(), $matches);

            if (!empty($matches)) {
                $this->endpoint = str_replace('{uuid}', end($matches),$config['endpoint']);
                $this->page = $config['page'];
                $this->enable([
                    'onPageInitialized' => ['onPageInitialized', 0],
                ]);

                return;
            }
        }
    }

Then I do my own logic into onPageInitialized I know this is very specific to my needs and certainly could be improved but for now it's working for me and I would be happy if this helps someone

hughbris commented 4 years ago

These days I run Grav from a Docker image and I believe this will finally provide a simple means for @rhuk and other maintainers to reproduce this.

I've just tried the (common!) regex redirect pattern listed above for the first time since moving to Docker, and have confirmed that this bug still exists in 1.6.23.

To reproduce: install and run a container based on dsavell/grav, set up some basic pages using the regex pattern and hit them.

If you want me to craft a derivative with those routing test files set up for you, I am also happy to do that so that you can reproduce this bug.

(the uri.query bug #2090 is now safely out of the picture)