codeigniter4 / CodeIgniter4

Open Source PHP Framework (originally from EllisLab)
https://codeigniter.com/
MIT License
5.35k stars 1.9k forks source link

Bug: Cannot add route to controller in filename with dash/hyphen #4294

Closed sneakyimp closed 3 years ago

sneakyimp commented 3 years ago

Describe the bug CodeIgniter 4's "translate uri dashes" functionality does not work under certain circumstances. If you want dashes in your uri that map into dashes in some controller subdirectory name, then you cannot add a route targeting controllers in that subdirectory.

I have an earlier post on the Codeigniter forum relating to this where I am trying to support URLs with dashes in them. Long story short, I must use dashes in my controller subdirectory names if I want to have dashes in my url. For whatever reason, the CI4 auto-routing is unable to recognize when a url with a dash in the path is referring to a controller subdirectory with an underscore.

E.g., if I want this url to connect to the default Home controller in my Nested-directory subdirectory:

https://www.example.com/subdir/nested-directory

Then I have to use a dash in the file system directory name. That url works if I put my controller in this location:

app/Controllers/Subdir/Nested-directory/Home.php

Note that the properly working Home controller uses an underscore in its namespace:

namespace App\Controllers\Subdir\Nested_directory;

use App\Controllers\BaseController;

class Home extends BaseController {

    public function index()
    {
        die("this is " . __METHOD__);
    }
}

That same url does not work if I put it in a directory with an underscore:

app/Controllers/Subdir/Nested_directory/Home.php

Curiously, it the dash in the url I request does get properly routed to an underscore while CI4 is calculating the controller path. However, CI4 doesn't bother to check if the url is requesting the default Home controller in a subdirectory. The complaint it gives is:

Controller or its method is not found: \App\Controllers\Subdir\Nested_directory::index

So it almost works, is just doesn't bother checking for the default Home controller in a subdir. Additionally, if I change my url to have an underscore in it, then CI4 can locate the Home controller in the subdirectory named with an underscore. This url:

https://www.example.com/subdir/nested_directory

Connects correctly to the controller here:

app/Controllers/Subdir/Nested_directory/Home.php

Previously, it was enough to simply make sure I name my subdirectories with dashes so they match the urls, however this does not work when i must add a $routes->add directive mapping urls to a controller that happens to live in a subdirectory with a dash. This route directive works just fine in my Config/Routes file if Nested_directory contains an underscore:

$routes->add("foobar/([0-9]+)", "Subdir\Nested_directory\Home::index/$1");

This serves a request for https://www.example.com/foobar/123 with the correct Home controller.

But if if the Nested-directory subdirectory containing my Home controller contains a dash, meaning my Home controller file is

app/Controllers/Subdir/Nested-directory/Home.php

then this will not work:

$routes->add("foobar/([0-9]+)", "Subdir\Nested_directory\Home::index/$1");

This will also not work:

$routes->add("foobar/([0-9]+)", "Subdir\Nested-directory\Home::index/$1");

Please note that I need to have these dashes instead of underscores because that is recommended by Google. I also very much do not want to create a big, complicated Routes file where I must route all of my many controllers. The auto-routing of CI3 was working just fine. I think this problem could be remedied with a small code change.

Can anyone point me to a solution?

For reference, here is my Config/Routes.php file

// Create a new instance of our RouteCollection class.
$routes = Services::routes();

// Load the system's routing file first, so that the app and ENVIRONMENT
// can override as needed.
if (file_exists(SYSTEMPATH . 'Config/Routes.php'))
{
 require SYSTEMPATH . 'Config/Routes.php';
}

/**
 * --------------------------------------------------------------------
 * Router Setup
 * --------------------------------------------------------------------
 */
$routes->setDefaultNamespace('App\Controllers');
$routes->setDefaultController('Home');
$routes->setDefaultMethod('index');
$routes->setTranslateURIDashes(true);
$routes->set404Override();
$routes->setAutoRoute(true);

/**
 * --------------------------------------------------------------------
 * Route Definitions
 * --------------------------------------------------------------------
 */

// We get a performance increase by specifying the default
// route since we don't have to scan directories.
$routes->get('/', 'Home::index');

// neither of these work if the controller is in Nested-directory
//$routes->add("foobar/([0-9]+)", "Subdir\Nested-directory\Home::index/$1");
$routes->add("foobar/([0-9]+)", "Subdir\Nested_directory\Home::index/$1");

CodeIgniter 4 version 4.1.1

Affected module(s) system/Router ? not entirely sure.

Expected behavior, and steps to reproduce if appropriate 1) I would expect CI4 to auto route a request for example.com/subdir/nested-directory to any of these controllers (if they exist), ideally in the sequence here:

Controllers/Subdir/Nested_directory::index Controllers/Subdir/Nested-directory/Home::index Controllers/Subdir/Nested_directory/Home::index

2) I might also expect adding a route like this:

$routes->add("foobar/([0-9]+)", "Subdir\Nested-directory\Home::index/$1");

to locate the controller at Controllers/Subdir/Nested-directory/Home::index. Note this dash translation for added routes may not be necessary if the autorouting feature can properly translate uri dashes.

Context

sneakyimp commented 3 years ago

I've been digging into the System\Router\Router class and it looks like there are problems with the validateRequest method. Firstly, I don't think this filtering does anything. Can someone explain to me what it's supposed to accomplish?:

        $segments = array_filter($segments, function ($segment) {
            // @phpstan-ignore-next-line
            return ! empty($segment) || ($segment !== '0' || $segment !== 0);
        });
        $segments = array_values($segments);

I think that return condition contains a tautology. I tried this code and the array_filter function has no impact whatsoever on the array's contents:

$segments1 = ["foo", "bar", 0, "0", NULL, FALSE, ""];
$segments2 = array_filter($segments1, function ($segment) {
        return ! empty($segment) || ($segment !== '0' || $segment !== 0);
});
if ($segments1 == $segments2) {
        echo "they are equal\n";
} else {
        echo "they are NOT equal\n";
}

Output is they are equal.

Also problematic is line 599:

            if (! is_file(APPPATH . 'Controllers/' . $test . '.php') && $directoryOverride === false && is_dir(APPPATH . 'Controllers/' . $this->directory . ucfirst($segments[0])))
            {
                $this->setDirectory(array_shift($segments), true);
                continue;
            }

The is_file check is properly translating dashes to underscores, but the is_dir check is checking the original dashed url segment. For the url example.com/subdir/nested-directory when there is a controller at Controllers/Subir/Nested_directory/Home.php, the while loop goes through two iterations with these values and results

== ITERATION 1 ==
is_file(/var/www/ci/app/Controllers/Subdir.php) is FALSE
override: FALSE
is_dir(/var/www/ci/app/Controllers/Subdir) is TRUE
if statement passes, setDirectory and continue...

== ITERATION 2 ==
is_file(/var/www/ci/app/Controllers/Subdir/Nested_directory.php) is FALSE
override: FALSE
is_dir(/var/www/ci/app/Controllers/Subdir/Nested-directory) is FALSE
if statement fails, rest of while loop executes yielding return value of an array: ["nested-directory"]

I believe this function should be modified so that the is_dir check is checking the dash-translated paths if the user options have set translateURIDashes=true.

@kenjis this is the post I referred to on the Codeigniter forum post.

paulbalandan commented 3 years ago

I've been digging into the System\Router\Router class and it looks like there are problems with the validateRequest method. Firstly, I don't think this filtering does anything. Can someone explain to me what it's supposed to accomplish?:

      $segments = array_filter($segments, function ($segment) {
          // @phpstan-ignore-next-line
          return ! empty($segment) || ($segment !== '0' || $segment !== 0);
      });
      $segments = array_values($segments);

I think that return condition contains a tautology.

Considering that the errors reported by phpstan were silenced means that there is something fishy with the logic, and yes, it seems both sides of || are always true.

This should have been:

$segments = array_filter($segments, function ($segment) {
    return ! empty($segment) || ($segment === '0' || $segment === 0);
});
paulbalandan commented 3 years ago

https://3v4l.org/smY3D

sneakyimp commented 3 years ago

Thank you, @paulbalandan

Considering that the errors reported by phpstan were silenced means that there is something fishy with the logic, and yes, it seems both sides of || are always true.

This should have been:

$segments = array_filter($segments, function ($segment) {
  return ! empty($segment) || ($segment === '0' || $segment === 0);
});

I guess we have to allow zeros as segments because they might be parameters, but I'm having trouble imagining any circumstances where a segment would actually be a zero. The value $segments is passed in as a parameter from an explode statement in the autoRoute function. Also, the return value of the logic is ambiguous, and would benefit from some parenthetical grouping. May I use this?

$segments = array_filter($segments, function ($segment) {
    return ! (empty($segment) || ($segment === '0'));
});
paulbalandan commented 3 years ago

The new logic would leave out 0 and '0' which we want to retain for edge cases like http://example.com/foo/a/0/d/cdvfhgj.

cf. https://3v4l.org/tWjEi

paulbalandan commented 3 years ago

And while you're here, can you make the function static?

static function ($segment): bool

sneakyimp commented 3 years ago

The new logic would leave out 0 and '0' which we want to retain for edge cases like http://example.com/foo/a/0/d/cdvfhgj.

cf. https://3v4l.org/tWjEi

Ah yes the logic I proposed is incorrect -- the space between ! and empty is confusing, makes it look like the ! applies to the entire line. Please note as I just pointed out above that the supplied array of $segments comes from an explode statement in autoRoute, so I don't think any segment will ever be an actual integer 0. Could we use this logic?

$segments = array_filter($segments, function ($segment) {
    return ($segment || ($segment === '0'));
});

that seems much more readable and clear to me. See: https://3v4l.org/b1PAP

paulbalandan commented 3 years ago

OK. The easiest to understand, I think.

$segments = array_filter($segments, static function ($segment): bool {
        return (string) $segment !== '';
});
sneakyimp commented 3 years ago

OK. The easiest to understand, I think.

$segments = array_filter($segments, static function ($segment): bool {
        return (string) $segment !== '';
});

Oooh nice that looks pretty tidy. I'll run a performance comparison and see which is faster.

paulbalandan commented 3 years ago

Booleans and integers will never appear in the URL. so simply just remove the type cast. 😂 https://3v4l.org/ItTbt

sneakyimp commented 3 years ago

EDIT: your latest variation wins. I'll use it.

$segments = ["foo", "bar", 0, "0", NULL, FALSE, ""];

$iterations = 1000000;

$start = microtime(TRUE);
for($i=0; $i<$iterations; $i++) {
        $segments1 = array_filter($segments, static function ($segment): bool {
                return $segment !== '';
        });
}
echo "#1 elapsed: " . (microtime(TRUE) - $start) . "\n";

$start = microtime(TRUE);
for($i=0; $i<$iterations; $i++) {
        $segments1 = array_filter($segments, static function ($segment): bool {
                return ($segment || $segment === '0');
        });
}
echo "#2 elapsed: " . (microtime(TRUE) - $start) . "\n";

results:

#1 elapsed: 0.50187611579895
#2 elapsed: 0.53266596794128

May I keep the $segmentConverted variable name? I think it makes the code clearer. Also, may I remove the ucfirst from setDirectory? It should have no effect on existing code and should improve performance a little bit by removing the redundant ucfirst operation.

paulbalandan commented 3 years ago

OK. Sure and sure. Those changes are not materially affecting the code so I'm fine either way.

sneakyimp commented 3 years ago

OK. Sure and sure. Those changes are not materially affecting the code so I'm fine either way.

Hmmmm. I got to thinking about this. We should think hard about security in these validateRequest and setDirectory functions. Keep in mind the vital role of this function: it parses the user-supplied request URI to determine what code actually runs. It might be worth a slightly slower function to make sure that we eliminate any possible mischief.

For starters, we might make sure $segment contains only strings consisting of characters permitted in a file path -- just to be extra safe. I don't think the explode statement in autoRoute can possibly return a non-string value, but is there any way a user or errant developer might supply an array of non-strings to Router::validateRequest -- e.g. by a poorly defined route? Should we enforce a string type for our filter function via type hinting?

// note the string type hint for the filter function
$segments = array_filter($segments, static function (string $segment): bool {
        return $segment !== '';
});

this should throw an exception if there were any non-string type in $segments. Interestingly, it does not throw exceptions for FALSE and 0 but does for null: https://3v4l.org/fC7d8

If you remove the NULL from $segments, then the string-type-hinted function does permit a zero to pass into the filtered array. https://3v4l.org/S70i3 Does letting an integer slip through present any risk? Given that this scanning our file system for directory and file names, I don't think a zero or any other integer presents a significant risk, but directory names like .. or : might be a different matter entirely.

Note that simply adding type hinting in our filter function does introduce a 10-15% performance penalty:

$segments = ["foo", "bar", 0, "0", FALSE, ""];
//$segments = ["foo", "bar", 0, "0", ""];

$iterations = 1000000;

$start = microtime(TRUE);
for($i=0; $i<$iterations; $i++) {
    $segments1 = array_filter($segments, static function (string $segment): bool {
            return $segment !== '';
    });
}
echo "#1 elapsed: " . (microtime(TRUE) - $start) . "\n";

$start = microtime(TRUE);
for($i=0; $i<$iterations; $i++) {
    $segments1 = array_filter($segments, static function ($segment): bool {
            return $segment !== '';
    });
}
echo "#2 elapsed: " . (microtime(TRUE) - $start) . "\n";

$start = microtime(TRUE);
for($i=0; $i<$iterations; $i++) {
    $segments1 = array_filter($segments, static function (string $segment): bool {
            return ($segment || $segment === '0');
    });
}
echo "#3 elapsed: " . (microtime(TRUE) - $start) . "\n";

$start = microtime(TRUE);
for($i=0; $i<$iterations; $i++) {
    $segments1 = array_filter($segments, static function ($segment): bool {
            return ($segment || $segment === '0');
    });
}
echo "#4 elapsed: " . (microtime(TRUE) - $start) . "\n";

results

#1 elapsed: 0.51697421073914
#2 elapsed: 0.44828510284424
#3 elapsed: 0.55848598480225
#4 elapsed: 0.48483991622925

As for eliminating mischief, I think we should do more. I do see some issues in our setDirectory function. In particular: 1) if 0 or '0' gets passed in, this sets $this->directory to null 2) my latest code modification applies ucfirst elsewhere, and I'd like to remove this to improve performance and because I think this function should not make any changes to the supplied value 3) setDirectory removes all periods -- a reasonable action, perhaps, but hardly adequate. I would prefer to have this function avoid modifying the supplied $dir parameter and instead reject any supplied dir value with an illegal directory name like . or .. and also reject any characters not permitted by the file system. Rather than modifying the supplied parameter, perhaps this function could return a boolean indicating if it was acceptable or not, and allow the calling scope (i.e., the validateRequest fn) to decide how to proceed.

I'm not entirely sure what criteria we should use to reject a filename or directory name, but we definitely need to prevent this autorouting behavior from being used to probe the file system or run arbitrary files (e.g., in the /tmp folder or a file upload location). I also think some more specific unit tests are needed.

For my reference later, Windows provides a list of forbidden file system chars and filenames.

sneakyimp commented 3 years ago

I've been taking a good hard look at the routing code, and would point out a few things:

Router::validateRequest() is a very poorly named function and its comments are not very helpful to describe its behavior. I'd like to change this function's name to scanControllers().

I do not understand the purpose of the $directoryOverride check in this function. I've been unable to identify any situation Router->directory would be non-empty before this function is called. The property directory is protected and I've found no other class extending this class. The only place in the entire project Router::setDirectory() gets called is inside this function and in system/Test/FeatureTestTrait.php to assign a null value. If we want to retain this behavior, it'd be more efficient to check for a non-empty directory value once rather than entering a loop where we check it over and over:

if (isset($this->directory)) {
    return $segments;
}

There are a couple of places in the Router class checking empty($this->directory). A directory value of '0' would satisfy that condition. This prompted me to think more closely about what valid directory segments actually are. I think any integer or string containing either periods or dashes would be an invalid directory segment and also an invalid namespace segment. It would be wise, for performance and security reasons, to stop scanning the file system in search of directories and controllers, once a segment is encountered that is not PSR-4-compliant directory/file name/namespace. I would very much like to hear the opinions of others on this point. The PHP docs don't seem to specify what characters are permitted in a namespace. The PSR-4 docs also don't seem to describe what characters are permitted. It would appear that multibyte characters are allowed. This script runs just fine with php 7.4:

namespace Sub\Φ\Myspace;

class Φoo {
  public function foo() {
        echo "woop";
  }
}

$v = new Φoo;
$v->foo();

Periods and dashes (and punctuation generally) are not permitted in namespace declarations, nor are any segments starting with or consisting entirely of numbers. These are invalid: namespace Foo\1\Bar namespace Foo\T.1\Bar namespace Foo\T-1\Bar

Both IncomingRequest::removeRelativeDirectory() and HTTP\URI::removeDotSegments() functions will remove any . or .. or multiple slashes, and this seems to effectively protect against attempts to use repeated parent directory (..) references to escape from the controllers directory. The Router class itself also removes any periods in the setDirectory method, but doesn't guard against dotted segments in the validateRequest function before checking the file system with is_file and is_dir functions. the PSR-4 validity check I suggest would help guard against such mischief.

I don't think the setDirectory function should be applying ucfirst to the supplied dir value or stripping periods. It could perhaps check the supplied $dir value for PSR-4 compliance and return true if the dir is accepted, false otherwise. if setDirectory rejected a segment and returned false, we could then branch accordingly in the validateRequest function.