PhileCMS / Phile

A flat file CMS with a swappable parser and template engine.
https://philecms.github.io/
Other
257 stars 49 forks source link

[BUGFIX] Use RecursiveRegexIterator with recursive regex #132

Closed STOWouters closed 10 years ago

STOWouters commented 10 years ago

Bugfix for those using PHP5.4, where the pages variable is always an empty array. This is because you have to use a recursive iterator when recursively iterating through the directories/files.

In addition, the regex must also have a trailing i, indicating that it will be used recursively.

Also see this user note.

Releases: 1.3.0

NeoBlack commented 10 years ago

+1 looks good for me

Schlaefer commented 10 years ago

:-1:

I'm using 5.4 and everything works fine here™. Do you use uppercase '.MD' file extensions? This would explain why the i modifier fixes it for you.

NeoBlack commented 10 years ago

:-1: @Schlaefer thx for your review, I think the i modifier is a good idea, because often editors create file extensions in upper case. currently RecursiveRegexIterator::GET_MATCH and RegexIterator::GET_MATCH has the same value, but this could change. I think we should use the contstants of the Iterator which is used and not mix RegexIterator with RecursiveRegexIterator::GET_MATCH. I have not see the different classes. this is the reason, why we defined at least two different reviewers ;)

@Stijn-Flipper please correct the mix of classes. and push it again.

STOWouters commented 10 years ago

I'm using 5.4 and everything works fine here™. Do you use uppercase '.MD' file extensions? This would explain why the i modifier fixes it for you.

No, I use lowercase .md extension, and it didn't work on PHP5.4.29. I've taken fresh look at it by reverting it back to before this PR. I've found out that the regex on fix #112 '/^.[^\.]*\\' . CONTENT_EXT . '/' didn't work as expected on that version. If I change it to regex '/^.*\\' . CONTENT_EXT . '$/i' then everything is fine. I seriously have no idea why it worked on yours and not on mine. There were several bugs with regex evaluation during PHP5.4, looking at the PHP changelog. I suppose something changed by accident and were reverted back later or so.

I'll push the changes and see whether this is better, thanks for the feedback :+1:

Schlaefer commented 10 years ago

In my experience it can take years to hit a bug in a mature x.y PHP release. My first assumption is that the flaw lies in my code, and I'm practically never wrong. ;)

'^.[^\.]*\\' explicitly allows any leading character (incl. dot) but no dots after that. So if your files have this schema my.file.md they are ignored too. Is that the issue? Try ^[^.].*\\ instead.

I don't believe we know what went wrong yet. Can you say "This place received this input and this variable here therefore should have this value, but instead it had that value?".

STOWouters commented 10 years ago

'^.[^.]\' explicitly allows any leading character (incl. dot) but no dots after that. So if your files have this schema my.file.md they are ignored too. Is that the issue? Try ^[^.].\ instead.

Ow, I thought they were both the same, the first one was supposed to eliminate files starting with a leading dot. Your second regex did also the trick.

I don't believe we know what went wrong yet. Can you say "This place received this input and this variable here therefore should have this value, but instead it had that value?".

Let's have some fun with var_dump then:

This is the code I used to debug it on lib/Phile/Utility.php:

        /* ... */

    public static function getFiles($directory, $fileNamePattern = '/^.*/') {
        $dir    = new \RecursiveDirectoryIterator($directory);
        $ite    = new \RecursiveIteratorIterator($dir);
        $files  = new \RegexIterator($ite, $fileNamePattern, \RegexIterator::GET_MATCH);

                var_dump($directory);
                var_dump($dir);
                var_dump($fileNamePattern);

        $result = array();
        foreach ($files as $file) {
            $result[] = (string)$file[0];
        }

                var_dump($result);
        return $result;
    }

        /* ... */

This is the expected output, as it actually did with PHP5.5:

$directory = string(38) "/home/stijn/Git/HacketyFlippy/content/" 

$dir = object(RecursiveDirectoryIterator)#34 (4) { ["pathName":"SplFileInfo":private]=> string(46) "/home/stijn/Git/HacketyFlippy/content/about.md" ["fileName":"SplFileInfo":private]=> string(8) "about.md" ["glob":"DirectoryIterator":private]=> bool(false) ["subPathName":"RecursiveDirectoryIterator":private]=> string(0) "" } 

$fileNamePattern = string(13) "/^.[^.]*\.md/" 

$result = array(12) { [0]=> string(46) "/home/stijn/Git/HacketyFlippy/content/about.md" [1]=> string(46) "/home/stijn/Git/HacketyFlippy/content/index.md" [2]=> string(55) "/home/stijn/Git/HacketyFlippy/content/projects/index.md" [3]=> string(59) "/home/stijn/Git/HacketyFlippy/content/projects/project/2.md" [4]=> string(59) "/home/stijn/Git/HacketyFlippy/content/projects/project/1.md" [5]=> string(44) "/home/stijn/Git/HacketyFlippy/content/404.md" [6]=> string(52) "/home/stijn/Git/HacketyFlippy/content/blog/post/4.md" [7]=> string(52) "/home/stijn/Git/HacketyFlippy/content/blog/post/2.md" [8]=> string(52) "/home/stijn/Git/HacketyFlippy/content/blog/post/5.md" [9]=> string(52) "/home/stijn/Git/HacketyFlippy/content/blog/post/1.md" [10]=> string(52) "/home/stijn/Git/HacketyFlippy/content/blog/post/3.md" [11]=> string(51) "/home/stijn/Git/HacketyFlippy/content/blog/index.md" }

.. and this is what I got with PHP5.4 (on production server):

$directory = string(50) "/var/www/vhosts/hacketyflippy.be/httpdocs/content/" 

$dir = object(RecursiveDirectoryIterator)#33 (4) { ["pathName":"SplFileInfo":private]=> string(56) "/var/www/vhosts/hacketyflippy.be/httpdocs/content/404.md" ["fileName":"SplFileInfo":private]=> string(6) "404.md" ["glob":"DirectoryIterator":private]=> bool(false) ["subPathName":"RecursiveDirectoryIterator":private]=> string(0) "" } 

$fileNamePattern = string(13) "/^.[^.]*\.md/" 

$result = array(0) { }

If I change the regex to ^[^.].*\\ then everything works just fine.

NeoBlack commented 10 years ago

@Stijn-Flipper can you list the filenames of the md files in your content folder?

Schlaefer commented 10 years ago

Ah, the issue is that the Iterator filters the full pathname, not the basename of the file. So my regex was wrong too and the problem is the dot in "hacketyflippy.be" in the server path.

Try this as pattern:

$DS = preg_quote(DIRECTORY_SEPARATOR, '/');
$fileNamePattern = '/^.*' . $DS . '[^\.]*\\' . CONTENT_EXT . '$/i';
NeoBlack commented 10 years ago

yeah, bad bug. But I am not sure, if your new RegExp work.

Some ideas for possible pathes:

Schlaefer commented 10 years ago

It's wrong too, because it omits the leading dot issue. This works:

$DS = preg_quote(DIRECTORY_SEPARATOR, '/');
$fileNamePattern = '/^.*content' . $DS . '[^.].*\\' . CONTENT_EXT . '$/i';

This should too but doesn't:

$DS = preg_quote(DIRECTORY_SEPARATOR, '/');
$fileNamePattern = '/.*' . $DS . '[^.].*\\' . CONTENT_EXT . '$/i';

and I don't know why at the moment. Someone has to find a good RegEx. ;)

STOWouters commented 10 years ago

Looking at the RecursiveDirectoryIterator docs, can't we just simply use the FilesystemIterator::SKIP_DOTS flag? See filesystemiterator constants. That would ease the regex.

Then we have a simple regex:

$DS = preg_quote(DIRECTORY_SEPARATOR, '/');
$fileNamePattern = '/^.*' . $DS . 'content' . $DS . '.*\\' . CONTENT_EXT . '$/i';

This should too but doesn't:

Shouldn't you have to escape the dot character on the [^.] part? Because a dot means 'any character'. If we use the FileSystemIterator::SKIP_DOTS flag, we don't have to take that into account.

@NeoBlack I don't think the markdown files/directories with a dot makes senses, because your URL would be something with a dot, like www.phile.com/foo.bar/foobar.foo and I don't think it's a legal URL syntax, is it?

NeoBlack commented 10 years ago

FileSystemIterator::SKIP_DOTS skip only the . and .. the urls with dots in path and filename are valid.

Schlaefer commented 10 years ago

simply use the FilesystemIterator::SKIP_DOTS flag

Great idea, but I can't get it working. Does it work for you? I admit I haven't used the DirectoryIterator directly. So I have to tinker with it some more.

Shouldn't you have to escape the dot character on the [^.] part? Because a dot means 'any character'.

In a character class […] it's usually not necessary to escape characters: The usual metacharacters are normal characters inside a character class, and do not need to be escaped by a backslash. To search for a star or plus, use [+*]. Your regex will work fine if you escape the regular metacharacters inside a character class, but doing so significantly reduces readability.

$fileNamePattern = '/^.*' . $DS . 'content' . $DS . '.*\\' . CONTENT_EXT . '$/i';

I would like to avoid to hardcode the content directory in the RegEx pattern. Not a deal breaker, but not a good habit either.

STOWouters commented 10 years ago

I would like to avoid to hardcode the content directory in the RegEx pattern. Not a deal breaker, but not a good habit either.

There's a CONTENT_DIR constant, I'm gonna tinker around with some regexes.

STOWouters commented 10 years ago

Okay, this is what works on and tested with examples given by @NeoBlack :

// ignore dotfiles in content directory and escape special regex 
// characters (/ and .)
$content_dir = preg_quote(CONTENT_DIR, '/');
$content_ext = preg_quote(CONTENT_EXT, '.');
$files = Utility::getFiles($folder, '/^' . $content_dir . '[^.].*' . $content_ext . '$/i');

If everybody is happy with that, I'll push it.

NeoBlack commented 10 years ago

Hey, I have rewritten the logic with the Iterators, please review #134 I think this should fix all our problems and make the Utility::getFiles method more flexible

Schlaefer commented 10 years ago

Ok, I think the right RegEx is:

$DS = preg_quote(DIRECTORY_SEPARATOR, '/');
$fileNamePattern = '/^.+' . $DS . '[^.][^' . $DS . ']*\\' . CONTENT_EXT . '$/i';

but maybe #134 makes this obsolete. Will review it tomorrow.

NeoBlack commented 10 years ago

@Schlaefer @Stijn-Flipper I think my solution (#134) is a little bit more object oriented, if you both can confirm that it work, I would merge this into 1.3.0.

STOWouters commented 10 years ago

@NeoBlack Your solution #134 is much better and more elegant, so I'll close this PR.