Deewiant / glob

Haskell library for glob matching
https://deewiant.iki.fi/projects/glob/
Other
25 stars 8 forks source link

Fix ** implicitly matching dot directories when set to False #30

Closed travv0 closed 5 years ago

travv0 commented 5 years ago

fixes #29

There was a check for null xs that was preventing it from checking directories past the first one.

travv0 commented 5 years ago

Actually, with this change "foo/**/.baz/asdf" no longer matches "foo/bar/.baz/asdf". I'll look at it some more.

travv0 commented 5 years ago

I'm pretty new at Haskell so I'm guessing this could be more efficient/easier to read, but basically the problem was that it was using init to get the paths to check for dots, so if any of them were hidden besides the last file/directory, even if they were matched explicitly and not by the **, it was returning False. Please let me know what I can do to improve this.

Edit: To add to this, the code was originally only checking past the first directory if the glob ended in "**/", which was the cause of the original problem.

travv0 commented 5 years ago

Thanks for the feedback! I'll take a look at this stuff soon.

On Tue, Dec 18, 2018 at 2:59 PM Matti Niemenmaa notifications@github.com wrote:

@Deewiant requested changes on this pull request.

Yeah, this was definitely not formulated as well as it could've been. (I was pretty new at Haskell when I wrote the whole library and haven't managed to take the time to clean things up at all.) Thanks for the PR and especially the test cases!

I think the approach itself is correct, just found some minor complaints.

In System/FilePath/Glob/Match.hs https://github.com/Deewiant/glob/pull/30#discussion_r242698211:

  • in if null xs && not (matchDotsImplicitly o)
  • -- **/ shouldn't match foo/.bar, so check that remaining bits don't
  • -- start with .
  • then all (not.isExtSeparator.head) (init parts) && matches
  • else matches +match' o (AnyDirectory:xs) path =
  • if not (matchDotsImplicitly o)
  • -- **/ shouldn't match foo/.bar, so check that remaining bits don't
  • -- start with .
  • then all (not.isExtSeparator.head) matchedDirs && hasMatch
  • else hasMatch
  • where parts = pathParts (dropWhile isPathSeparator path)
  • matches = map (match' o xs) parts
  • hasMatch = or matches
  • matchIndex = fromMaybe 0 (elemIndex True matches)
  • (matchedDirs, _) = splitAt matchIndex parts

Since you're not using the other half of the split, this could be simplified using take.

In System/FilePath/Glob/Match.hs https://github.com/Deewiant/glob/pull/30#discussion_r242699009:

@@ -122,14 +124,17 @@ match' o (OpenRange lo hi :xs) path = match' o again@(AnyNonPathSeparator:xs) path@(c:cs) = match' o xs path || (not (isPathSeparator c) && match' o again cs)

-match' o again@(AnyDirectory:xs) path =

  • let parts = pathParts (dropWhile isPathSeparator path)
  • matches = any (match' o xs) parts || any (match' o again) (tail parts)
  • in if null xs && not (matchDotsImplicitly o)
  • -- **/ shouldn't match foo/.bar, so check that remaining bits don't
  • -- start with .
  • then all (not.isExtSeparator.head) (init parts) && matches
  • else matches +match' o (AnyDirectory:xs) path =
  • if not (matchDotsImplicitly o)

Flipping the then/else cases and removing this not would be nicer since it avoids a double negation.

In System/FilePath/Glob/Match.hs https://github.com/Deewiant/glob/pull/30#discussion_r242700086:

@@ -122,14 +124,17 @@ match' o (OpenRange lo hi :xs) path = match' o again@(AnyNonPathSeparator:xs) path@(c:cs) = match' o xs path || (not (isPathSeparator c) && match' o again cs)

-match' o again@(AnyDirectory:xs) path =

  • let parts = pathParts (dropWhile isPathSeparator path)
  • matches = any (match' o xs) parts || any (match' o again) (tail parts)
  • in if null xs && not (matchDotsImplicitly o)
  • -- **/ shouldn't match foo/.bar, so check that remaining bits don't
  • -- start with .
  • then all (not.isExtSeparator.head) (init parts) && matches
  • else matches +match' o (AnyDirectory:xs) path =
  • if not (matchDotsImplicitly o)
  • -- **/ shouldn't match foo/.bar, so check that remaining bits don't
  • -- start with .

This comment is a bit out of date now, since it talks about "remaining bits" but it's now concerning "the directories matched by **/" or something along those lines.

In System/FilePath/Glob/Match.hs https://github.com/Deewiant/glob/pull/30#discussion_r242701631:

  • matches = any (match' o xs) parts || any (match' o again) (tail parts)
  • in if null xs && not (matchDotsImplicitly o)
  • -- **/ shouldn't match foo/.bar, so check that remaining bits don't
  • -- start with .
  • then all (not.isExtSeparator.head) (init parts) && matches
  • else matches +match' o (AnyDirectory:xs) path =
  • if not (matchDotsImplicitly o)
  • -- **/ shouldn't match foo/.bar, so check that remaining bits don't
  • -- start with .
  • then all (not.isExtSeparator.head) matchedDirs && hasMatch
  • else hasMatch
  • where parts = pathParts (dropWhile isPathSeparator path)
  • matches = map (match' o xs) parts
  • hasMatch = or matches
  • matchIndex = fromMaybe 0 (elemIndex True matches)

As a tiny performance thing, I think referencing matches twice like this is likely to prevent the intermediate list from getting optimized away. I'm not entirely sure though, feel free to examine the compiler output if you want to check. An alternative would be:

matchIndex = elemIndex True matches hasMatch = isJust matchIndex (matchedDirs, _) = splitAt (fromMaybe 0 matchIndex) parts

At that point you could also go about things more directly and get rid of matches, using findIndex (or whatever it's called) instead of elemIndex on top of map.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/Deewiant/glob/pull/30#pullrequestreview-186272739, or mute the thread https://github.com/notifications/unsubscribe-auth/ADcEEpM00yTFqPpfXmdUNLGwf21U95enks5u6VcVgaJpZM4ZXa2E .

travv0 commented 5 years ago

I've fixed those issues. I did have a question about how short-circuiting works in Haskell. Would

all (not.isExtSeparator.head) matchedDirs && hasMatch

be more efficient if written as

hasMatch && all (not.isExtSeparator.head) matchedDirs

?

Deewiant commented 5 years ago

It would be better like that now, yes. Previously the isExtSeparator check had no dependency on any recursive match results, so I wrote it that way around on the assumption that it's much quicker than the recursive calls involved in hasMatch (which was called matches prior to this PR). Currently you need to compute matchIndex for either side of the &&, so might as well do the cheaper hasMatch check first.

(Whether this actually ends up being more efficient for the user depends on how often hasMatch is true or false...)