Closed hdgarrood closed 7 years ago
It seems the tests are failing because compile
usually optimises patterns like [a]
to simply a
, but not always; for instance:
> compile "[a]"
compile "a"
> compile "[.]"
compile "[.]"
However, commonDirectory (compile "[.]/*")
does seem to make this optimisation somehow:
> commonDirectory (compile "[.]/*")
("./",compile "*")
so we end up with compile "[.]/*" /= compile "./*"
. Perhaps compile "[.]"
should always end up equal to compile "."
?
This comment in System.FilePath.Glob.Base
seems to indicate the behaviour of compile "[.]"
is intended:
optimizeCharRange :: Token -> Token
optimizeCharRange (CharRange b_ rs) = fin b_ . go . sortCharRange $ rs
where
-- [/] is interesting, it actually matches nothing at all
-- [.] can be Literalized though, just don't make it into an ExtSeparator so
-- that it doesn't match a leading dot
fin True [Left c] | not (isPathSeparator c) = Literal c
fin True [Right r] | r == (minBound,maxBound) = NonPathSeparator
fin b x = CharRange b x
[...]
I'm not really sure what should happen here.
Yes, the compile
behaviour is intended. E.g. man 7 glob
:
If a filename starts with a '.', this character must be matched explicitly.
Or more clearly in the Tcl manpages of all places, man n glob
:
On Unix, as with csh, a “.” at the beginning of a file's name or just after a “/” must be matched explicitly or with a {} construct, unless the
-types hidden
flag is given (since “.” at the beginning of a file's name indicates that it is hidden). On other platforms, files beginning with a “.” are handled no differently to any others, except the special directories “.” and “..” which must be matched explicitly [...]
...I'd rather not worry about the apparent platform-specificity of this until somebody complains.
The fact that commonDirectory
turns [.]
into .
is a bug, though. EDIT: hang on, I misread. EDIT 2: I was right the first time.
I think commonDirectory
is correct in turning [.]
into .
, since it only does this for the first component of the tuple it returns, which is a FilePath
, not a Pattern
, and therefore I would expect that to be a literal directory path:
> commonDirectory (compile "[.]/*/[.]")
("./",compile "*/[.]")
(since I would expect to be able to do things like doesDirectoryExist
and getDirectoryContents
on it)
A funny edge case indeed. The pattern "[.]/*"
cannot match anything, just as the pattern "[.]"
does not match the .
directory. So I do think commonDirectory
is wrong here.
Oh, I see! So what should commonDirectory (compile "[.]/*")
return? Perhaps ("", compile "[.]/*")
?
I.e. I think the following should be true:
commonDirectory (compile "[.]/*") == ("", compile "[.]/*")
Maybe optimize
should optimize any pattern starting with [.]
to [.]
, and that could be the canonical black hole pattern that doesn't match anything.
Should compile "[.]foo"
match .foo
?
Oh I see, we're going off man 7 glob
, not the Tcl one. edit: or rather, what both of them say, assuming we're on Unix.
No, the Tcl manpage is in agreement: 'a “.” at the beginning of a file's name or just after a “/” must be matched explicitly or with a {} construct'. [.]
isn't matching explicitly.
Ok, so if I modify compile
to make it work as it does now, except that if [.]
occurs at the beginning or directly after a path separator inside a pattern, the whole lot gets optimised to the black-hole pattern [.]
? and hopefully that should solve this issue?
Seems like that would indeed be a roundabout way of fixing it, since commonDirectory (compile "[.]")
does not result in (".", compile "")
like I thought it might.
But I'd at least like a note in separate
that it relies on that optimization happening and will otherwise mishandle this situation.
Having thought about it a little more, I think separate
seems to be okay right now:
> separate (compile "[.]/*")
[Dir 1 (compile "[.]"),Any (compile "*")] -- this is what I would expect
I think fromConst
in commonDirectory
is the cause of the issue, since if we have Literal '.'
or LongLiteral 1 "."
that indicates that the . was matched non-explicitly and therefore should not match if it's directly after a path separator or at the beginning, correct? But currently it will simply appear as "." in the FilePath
component of the returned tuple? So presumably fromConst
needs to be slightly modified to take care of that? That does seem more sensible than my previous suggestion.
Good analysis. I think I'd rather amend splitP
than fromConst
though but I'm not sure, feel free to tackle it however you see fits best.
Yes of course, amending splitP
did turn out to be easier. I've also added some more test cases to the tests so that if these tests do ever start failing again it won't be nondeterministic.
Thanks!
I modified
unseparate
based on the following observations:Int
argument to theDir
andAnyDir
constructors specifies the number of path separators directly following thePattern
AnyDirectory
token represents the string**/
in aPattern
; therefore, we subtract 1 from the argument when unseparating. Otherwise, we end up with one too many separators.f
so that it producesn
path separators rather thann+1
as it was before.The
unseparate
function is used in one other place, inmatchTypedAndGo
when we encounter anAnyDir
pattern. The current test suite still passes, but I'm not really sure if that's enough to conclude that I haven't accidentally broken it; do you have any thoughts on that?