dazinator / DotNet.Glob

A fast globbing library for .NET / .NETStandard applications. Outperforms Regex.
MIT License
363 stars 27 forks source link

Fixes for #51 & #52 #53

Closed cocowalla closed 6 years ago

cocowalla commented 6 years ago
dazinator commented 6 years ago

Thanks for this, i'll have a look through and merge it soon :-)

dazinator commented 6 years ago

I have had a look at this now - thanks for also sorting out some styling issues, variable declarations etc.

There are a couple of things I'd like to amend in this PR, so I'll merge it to a feature branch and make my amendments there.

Basically a couple of thoughts that came to mind whilst I was reviewing this were:

  1. Think we should move the new tests Can_Escape_Special_Characters to be in GlobTests.cs
  2. Wondered how we would handle the following: match any path that doesn't contain a ?

It feels like to me that the escape sequence, i.e [?] is basically the same as the existing character list token, just with a single character in it. The tokeniser already parses character list tokens, so I was wondering rather than have this additional IsEscapeSequence check:


if (reader.IsEscapeSequence)
                    {
                        tokens.Add(ReadEscapedCharacter(reader));
                    }
                    else if (reader.IsBeginningOfRangeOrList)
                    {
                        tokens.Add(ReadRangeOrListToken(reader));
                    }

I think it might be better to extend the CharacterListToken that get's parsed in ReadRangeOrListToken

                if (reader.IsBeginningOfRangeOrList)
                {
                    tokens.Add(ReadRangeOrListToken(reader));
                }

The reason I am thinking of that is because character lists handle negation, so in answer to point 2 above when this works correctly, you should then be able to do something like this:

// match files that don't have a ? in the filename after bar

/foo/bar[!?].baz

We will still have to handle those edge cases i.e

/foo/bar[!]].baz etc

And if you wanted to match files that didn't have an ! after bar then this would need to work too:

/foo/bar[!!].baz etc

Anyway I'll play around with it a bit in my feature branch and see what happens - thank you for adding the tests BTW.

cocowalla commented 6 years ago

When I added this, I realised that the existing square bracket handling might work as-is, but there was a reason why I thought it wouldn't.. can't remember off the top of my head why though! I really should have added a comment at the time!

I'll take another look at this tomorrow and see if it comes back to me, but one other reason is just to be explicit about escape sequence handling, rather than it being a happy side-effect of something else.

dazinator commented 6 years ago

Haha :-) Well I've merged this to: https://github.com/dazinator/DotNet.Glob/tree/feature/escaping I have left the logic alone, but I have added 3 more failing test cases to demonstrate the cases above that I think we should get working to make this feature complete. I still think extending the char list token is the best route to go, but I haven't sat down and attempted it - so if there is a reason that's fine, the main thing is just that those remaining tests light up and then I think we can call this feature complete.

cocowalla commented 6 years ago

So, I took another look, and I think the reasons I separated it from the existing square bracket handling were:

  1. To be explicit, rather than lumping it in with square bracket patterns
  2. The existing handling doesn't work as-is for ? or *. For example, "[?]" would result in a pattern than matches any single character, rather than a literal ?. I figured this would result in the existing handling becoming somewhat convoluted versus adding explicit escape sequence handling

But I don't feel strongly either way, and of course this is your baby :)

Regarding negation of special characters, e.g. [!!], I see that as something separate, not related to escape sequence handling (at least, my implementation only considers a single literal character). Also, for the test cases you added, I'm a bit confused, e.g. [InlineData(@"/foo/bar[!!].baz", @"/foo/bar7.baz")]

Shouldn't this be (which actually passes): [InlineData(@"/foo/bar[!!].baz", @"/foo/bar[!!].baz")]

dazinator commented 6 years ago

Hey, thanks for spending more time thinking on this it's greatly appreciated :-)

I understand what you have said above. Hopefully I can clarify regarding your questions.

So my thinking with /foo/bar[!!].baz is that I wanted to express: don't match on the literal character !. So in other words.. match any other character than ! that follows bar - i.e it's a negation.:-)

The existing tests you added covered the ability to match on an escaped character, but I could't see any for not matching the same escaped character. I am thinking that this is a valid use case and I appreciate it is an addition / extension to the scope of what you have done so far. So my thinking is either we extend the present escape sequence token parsing that you added to deal with negation, or we fix up char list which already has a notion of negation - and im fine with either approach because really this logic only runs at pattern parse time.. and the most perf sensitive logic is actually pattern match time..ie you would likely only parse a pattern once, and then match against thousands of strings.