Deewiant / glob

Haskell library for glob matching
https://deewiant.iki.fi/projects/glob/
Other
25 stars 8 forks source link

Fix `commonDirectory`, fixes #18 #19

Closed hdgarrood closed 7 years ago

hdgarrood commented 7 years ago

I modified unseparate based on the following observations:

The unseparate function is used in one other place, in matchTypedAndGo when we encounter an AnyDir pattern. The current test suite still passes, but I'm not really sure if that's enough to conclude that I haven't accidentally broken it; do you have any thoughts on that?

hdgarrood commented 7 years ago

It seems the tests are failing because compile usually optimises patterns like [a] to simply a, but not always; for instance:

> compile "[a]"
compile "a"

> compile "[.]"
compile "[.]"

However, commonDirectory (compile "[.]/*") does seem to make this optimisation somehow:

> commonDirectory (compile "[.]/*")
("./",compile "*")

so we end up with compile "[.]/*" /= compile "./*". Perhaps compile "[.]" should always end up equal to compile "."?

hdgarrood commented 7 years ago

This comment in System.FilePath.Glob.Base seems to indicate the behaviour of compile "[.]" is intended:

optimizeCharRange :: Token -> Token
optimizeCharRange (CharRange b_ rs) = fin b_ . go . sortCharRange $ rs
 where
   -- [/] is interesting, it actually matches nothing at all
   -- [.] can be Literalized though, just don't make it into an ExtSeparator so
   --     that it doesn't match a leading dot
   fin True [Left  c] | not (isPathSeparator c) = Literal c
   fin True [Right r] | r == (minBound,maxBound) = NonPathSeparator
   fin b x = CharRange b x
   [...]

I'm not really sure what should happen here.

Deewiant commented 7 years ago

Yes, the compile behaviour is intended. E.g. man 7 glob:

If a filename starts with a '.', this character must be matched explicitly.

Or more clearly in the Tcl manpages of all places, man n glob:

On Unix, as with csh, a “.” at the beginning of a file's name or just after a “/” must be matched explicitly or with a {} construct, unless the -types hidden flag is given (since “.” at the beginning of a file's name indicates that it is hidden). On other platforms, files beginning with a “.” are handled no differently to any others, except the special directories “.” and “..” which must be matched explicitly [...]

...I'd rather not worry about the apparent platform-specificity of this until somebody complains.

The fact that commonDirectory turns [.] into . is a bug, though. EDIT: hang on, I misread. EDIT 2: I was right the first time.

hdgarrood commented 7 years ago

I think commonDirectory is correct in turning [.] into ., since it only does this for the first component of the tuple it returns, which is a FilePath, not a Pattern, and therefore I would expect that to be a literal directory path:

> commonDirectory (compile "[.]/*/[.]")
("./",compile "*/[.]")
hdgarrood commented 7 years ago

(since I would expect to be able to do things like doesDirectoryExist and getDirectoryContents on it)

Deewiant commented 7 years ago

A funny edge case indeed. The pattern "[.]/*" cannot match anything, just as the pattern "[.]" does not match the . directory. So I do think commonDirectory is wrong here.

hdgarrood commented 7 years ago

Oh, I see! So what should commonDirectory (compile "[.]/*") return? Perhaps ("", compile "[.]/*")?

Deewiant commented 7 years ago

I.e. I think the following should be true:

commonDirectory (compile "[.]/*") == ("", compile "[.]/*")

Maybe optimize should optimize any pattern starting with [.] to [.], and that could be the canonical black hole pattern that doesn't match anything.

hdgarrood commented 7 years ago

Should compile "[.]foo" match .foo?

hdgarrood commented 7 years ago

Oh I see, we're going off man 7 glob, not the Tcl one. edit: or rather, what both of them say, assuming we're on Unix.

Deewiant commented 7 years ago

No, the Tcl manpage is in agreement: 'a “.” at the beginning of a file's name or just after a “/” must be matched explicitly or with a {} construct'. [.] isn't matching explicitly.

hdgarrood commented 7 years ago

Ok, so if I modify compile to make it work as it does now, except that if [.] occurs at the beginning or directly after a path separator inside a pattern, the whole lot gets optimised to the black-hole pattern [.]? and hopefully that should solve this issue?

Deewiant commented 7 years ago

Seems like that would indeed be a roundabout way of fixing it, since commonDirectory (compile "[.]") does not result in (".", compile "") like I thought it might.

But I'd at least like a note in separate that it relies on that optimization happening and will otherwise mishandle this situation.

hdgarrood commented 7 years ago

Having thought about it a little more, I think separate seems to be okay right now:

> separate (compile "[.]/*")
[Dir 1 (compile "[.]"),Any (compile "*")] -- this is what I would expect

I think fromConst in commonDirectory is the cause of the issue, since if we have Literal '.' or LongLiteral 1 "." that indicates that the . was matched non-explicitly and therefore should not match if it's directly after a path separator or at the beginning, correct? But currently it will simply appear as "." in the FilePath component of the returned tuple? So presumably fromConst needs to be slightly modified to take care of that? That does seem more sensible than my previous suggestion.

Deewiant commented 7 years ago

Good analysis. I think I'd rather amend splitP than fromConst though but I'm not sure, feel free to tackle it however you see fits best.

hdgarrood commented 7 years ago

Yes of course, amending splitP did turn out to be easier. I've also added some more test cases to the tests so that if these tests do ever start failing again it won't be nondeterministic.

Deewiant commented 7 years ago

Thanks!