Closed STOWouters closed 10 years ago
+1 looks good for me
:-1:
RegexIterator
(which is currently used) has a RegexIterator::GET_MATCH
and RecursiveRegexIterator
the RecursiveRegexIterator::GET_MATCH
; so there's nothing wrong with thati
is a regex modifier (case insensitive) and does not indicate recursion (not that it's a bad idea, see below)I'm using 5.4 and everything works fine here™. Do you use uppercase '.MD' file extensions? This would explain why the i
modifier fixes it for you.
:-1:
@Schlaefer thx for your review, I think the i
modifier is a good idea, because often editors create file extensions in upper case.
currently RecursiveRegexIterator::GET_MATCH
and RegexIterator::GET_MATCH
has the same value, but this could change. I think we should use the contstants of the Iterator which is used and not mix RegexIterator
with RecursiveRegexIterator::GET_MATCH
. I have not see the different classes.
this is the reason, why we defined at least two different reviewers ;)
@Stijn-Flipper please correct the mix of classes. and push it again.
I'm using 5.4 and everything works fine here™. Do you use uppercase '.MD' file extensions? This would explain why the i modifier fixes it for you.
No, I use lowercase .md
extension, and it didn't work on PHP5.4.29. I've taken fresh look at it by reverting it back to before this PR. I've found out that the regex on fix #112 '/^.[^\.]*\\' . CONTENT_EXT . '/'
didn't work as expected on that version. If I change it to regex '/^.*\\' . CONTENT_EXT . '$/i'
then everything is fine. I seriously have no idea why it worked on yours and not on mine. There were several bugs with regex evaluation during PHP5.4, looking at the PHP changelog. I suppose something changed by accident and were reverted back later or so.
I'll push the changes and see whether this is better, thanks for the feedback :+1:
In my experience it can take years to hit a bug in a mature x.y PHP release. My first assumption is that the flaw lies in my code, and I'm practically never wrong. ;)
'^.[^\.]*\\'
explicitly allows any leading character (incl. dot) but no dots after that. So if your files have this schema my.file.md
they are ignored too. Is that the issue? Try ^[^.].*\\
instead.
I don't believe we know what went wrong yet. Can you say "This place received this input and this variable here therefore should have this value, but instead it had that value?".
'^.[^.]\' explicitly allows any leading character (incl. dot) but no dots after that. So if your files have this schema my.file.md they are ignored too. Is that the issue? Try ^[^.].\ instead.
Ow, I thought they were both the same, the first one was supposed to eliminate files starting with a leading dot. Your second regex did also the trick.
I don't believe we know what went wrong yet. Can you say "This place received this input and this variable here therefore should have this value, but instead it had that value?".
Let's have some fun with var_dump
then:
This is the code I used to debug it on lib/Phile/Utility.php
:
/* ... */
public static function getFiles($directory, $fileNamePattern = '/^.*/') {
$dir = new \RecursiveDirectoryIterator($directory);
$ite = new \RecursiveIteratorIterator($dir);
$files = new \RegexIterator($ite, $fileNamePattern, \RegexIterator::GET_MATCH);
var_dump($directory);
var_dump($dir);
var_dump($fileNamePattern);
$result = array();
foreach ($files as $file) {
$result[] = (string)$file[0];
}
var_dump($result);
return $result;
}
/* ... */
This is the expected output, as it actually did with PHP5.5:
$directory = string(38) "/home/stijn/Git/HacketyFlippy/content/"
$dir = object(RecursiveDirectoryIterator)#34 (4) { ["pathName":"SplFileInfo":private]=> string(46) "/home/stijn/Git/HacketyFlippy/content/about.md" ["fileName":"SplFileInfo":private]=> string(8) "about.md" ["glob":"DirectoryIterator":private]=> bool(false) ["subPathName":"RecursiveDirectoryIterator":private]=> string(0) "" }
$fileNamePattern = string(13) "/^.[^.]*\.md/"
$result = array(12) { [0]=> string(46) "/home/stijn/Git/HacketyFlippy/content/about.md" [1]=> string(46) "/home/stijn/Git/HacketyFlippy/content/index.md" [2]=> string(55) "/home/stijn/Git/HacketyFlippy/content/projects/index.md" [3]=> string(59) "/home/stijn/Git/HacketyFlippy/content/projects/project/2.md" [4]=> string(59) "/home/stijn/Git/HacketyFlippy/content/projects/project/1.md" [5]=> string(44) "/home/stijn/Git/HacketyFlippy/content/404.md" [6]=> string(52) "/home/stijn/Git/HacketyFlippy/content/blog/post/4.md" [7]=> string(52) "/home/stijn/Git/HacketyFlippy/content/blog/post/2.md" [8]=> string(52) "/home/stijn/Git/HacketyFlippy/content/blog/post/5.md" [9]=> string(52) "/home/stijn/Git/HacketyFlippy/content/blog/post/1.md" [10]=> string(52) "/home/stijn/Git/HacketyFlippy/content/blog/post/3.md" [11]=> string(51) "/home/stijn/Git/HacketyFlippy/content/blog/index.md" }
.. and this is what I got with PHP5.4 (on production server):
$directory = string(50) "/var/www/vhosts/hacketyflippy.be/httpdocs/content/"
$dir = object(RecursiveDirectoryIterator)#33 (4) { ["pathName":"SplFileInfo":private]=> string(56) "/var/www/vhosts/hacketyflippy.be/httpdocs/content/404.md" ["fileName":"SplFileInfo":private]=> string(6) "404.md" ["glob":"DirectoryIterator":private]=> bool(false) ["subPathName":"RecursiveDirectoryIterator":private]=> string(0) "" }
$fileNamePattern = string(13) "/^.[^.]*\.md/"
$result = array(0) { }
If I change the regex to ^[^.].*\\
then everything works just fine.
@Stijn-Flipper can you list the filenames of the md files in your content folder?
Ah, the issue is that the Iterator filters the full pathname, not the basename of the file. So my regex was wrong too and the problem is the dot in "hacketyflippy.be" in the server path.
Try this as pattern:
$DS = preg_quote(DIRECTORY_SEPARATOR, '/');
$fileNamePattern = '/^.*' . $DS . '[^\.]*\\' . CONTENT_EXT . '$/i';
yeah, bad bug. But I am not sure, if your new RegExp work.
Some ideas for possible pathes:
/var/www/htdocs/content/foo.md
/var/www/htdocs/content/foo/bar.md
/var/www/htdocs/content/foo.bar/bar.md
/var/www/htdocs/content/foo/bar.foo.md
/var/www/htdocs/content/foo/bar-foo.md
/var/www/htdocs/content/foo/bar_foo.md
/var/www.example.com/htdocs/content/foo/bar.md
/var/www.example.com/htdocs/content/foo.bar/bar.md
/var/www.example.com/htdocs/content/foo/bar.foo.md
It's wrong too, because it omits the leading dot issue. This works:
$DS = preg_quote(DIRECTORY_SEPARATOR, '/');
$fileNamePattern = '/^.*content' . $DS . '[^.].*\\' . CONTENT_EXT . '$/i';
This should too but doesn't:
$DS = preg_quote(DIRECTORY_SEPARATOR, '/');
$fileNamePattern = '/.*' . $DS . '[^.].*\\' . CONTENT_EXT . '$/i';
and I don't know why at the moment. Someone has to find a good RegEx. ;)
Looking at the RecursiveDirectoryIterator
docs, can't we just simply use the FilesystemIterator::SKIP_DOTS
flag? See filesystemiterator constants. That would ease the regex.
Then we have a simple regex:
$DS = preg_quote(DIRECTORY_SEPARATOR, '/');
$fileNamePattern = '/^.*' . $DS . 'content' . $DS . '.*\\' . CONTENT_EXT . '$/i';
This should too but doesn't:
Shouldn't you have to escape the dot character on the [^.]
part? Because a dot means 'any character'. If we use the FileSystemIterator::SKIP_DOTS
flag, we don't have to take that into account.
@NeoBlack I don't think the markdown files/directories with a dot makes senses, because your URL would be something with a dot, like www.phile.com/foo.bar/foobar.foo
and I don't think it's a legal URL syntax, is it?
FileSystemIterator::SKIP_DOTS
skip only the .
and ..
the urls with dots in path and filename are valid.
simply use the FilesystemIterator::SKIP_DOTS flag
Great idea, but I can't get it working. Does it work for you? I admit I haven't used the DirectoryIterator directly. So I have to tinker with it some more.
Shouldn't you have to escape the dot character on the [^.] part? Because a dot means 'any character'.
In a character class […]
it's usually not necessary to escape characters: The usual metacharacters are normal characters inside a character class, and do not need to be escaped by a backslash. To search for a star or plus, use [+*]. Your regex will work fine if you escape the regular metacharacters inside a character class, but doing so significantly reduces readability.
$fileNamePattern = '/^.*' . $DS . 'content' . $DS . '.*\\' . CONTENT_EXT . '$/i';
I would like to avoid to hardcode the content
directory in the RegEx pattern. Not a deal breaker, but not a good habit either.
I would like to avoid to hardcode the content directory in the RegEx pattern. Not a deal breaker, but not a good habit either.
There's a CONTENT_DIR
constant, I'm gonna tinker around with some regexes.
Okay, this is what works on and tested with examples given by @NeoBlack :
// ignore dotfiles in content directory and escape special regex
// characters (/ and .)
$content_dir = preg_quote(CONTENT_DIR, '/');
$content_ext = preg_quote(CONTENT_EXT, '.');
$files = Utility::getFiles($folder, '/^' . $content_dir . '[^.].*' . $content_ext . '$/i');
If everybody is happy with that, I'll push it.
Hey, I have rewritten the logic with the Iterators, please review #134 I think this should fix all our problems and make the Utility::getFiles
method more flexible
Ok, I think the right RegEx is:
$DS = preg_quote(DIRECTORY_SEPARATOR, '/');
$fileNamePattern = '/^.+' . $DS . '[^.][^' . $DS . ']*\\' . CONTENT_EXT . '$/i';
but maybe #134 makes this obsolete. Will review it tomorrow.
@Schlaefer @Stijn-Flipper I think my solution (#134) is a little bit more object oriented, if you both can confirm that it work, I would merge this into 1.3.0.
@NeoBlack Your solution #134 is much better and more elegant, so I'll close this PR.
Bugfix for those using PHP5.4, where the
pages
variable is always an empty array. This is because you have to use a recursive iterator when recursively iterating through the directories/files.In addition, the regex must also have a trailing
i
, indicating that it will be used recursively.Also see this user note.
Releases: 1.3.0