Closed cocowalla closed 6 years ago
Thanks for this, i'll have a look through and merge it soon :-)
I have had a look at this now - thanks for also sorting out some styling issues, variable declarations etc.
There are a couple of things I'd like to amend in this PR, so I'll merge it to a feature branch and make my amendments there.
Basically a couple of thoughts that came to mind whilst I was reviewing this were:
Can_Escape_Special_Characters
to be in GlobTests.cs
It feels like to me that the escape sequence, i.e [?] is basically the same as the existing character list token, just with a single character in it. The tokeniser already parses character list tokens, so I was wondering rather than have this additional IsEscapeSequence
check:
if (reader.IsEscapeSequence)
{
tokens.Add(ReadEscapedCharacter(reader));
}
else if (reader.IsBeginningOfRangeOrList)
{
tokens.Add(ReadRangeOrListToken(reader));
}
I think it might be better to extend the CharacterListToken that get's parsed in ReadRangeOrListToken
if (reader.IsBeginningOfRangeOrList)
{
tokens.Add(ReadRangeOrListToken(reader));
}
The reason I am thinking of that is because character lists handle negation, so in answer to point 2 above when this works correctly, you should then be able to do something like this:
// match files that don't have a ? in the filename after bar
/foo/bar[!?].baz
We will still have to handle those edge cases i.e
/foo/bar[!]].baz
etc
And if you wanted to match files that didn't have an ! after bar then this would need to work too:
/foo/bar[!!].baz
etc
Anyway I'll play around with it a bit in my feature branch and see what happens - thank you for adding the tests BTW.
When I added this, I realised that the existing square bracket handling might work as-is, but there was a reason why I thought it wouldn't.. can't remember off the top of my head why though! I really should have added a comment at the time!
I'll take another look at this tomorrow and see if it comes back to me, but one other reason is just to be explicit about escape sequence handling, rather than it being a happy side-effect of something else.
Haha :-) Well I've merged this to: https://github.com/dazinator/DotNet.Glob/tree/feature/escaping I have left the logic alone, but I have added 3 more failing test cases to demonstrate the cases above that I think we should get working to make this feature complete. I still think extending the char list token is the best route to go, but I haven't sat down and attempted it - so if there is a reason that's fine, the main thing is just that those remaining tests light up and then I think we can call this feature complete.
So, I took another look, and I think the reasons I separated it from the existing square bracket handling were:
?
or *
. For example, "[?]"
would result in a pattern than matches any single character, rather than a literal ?
. I figured this would result in the existing handling becoming somewhat convoluted versus adding explicit escape sequence handlingBut I don't feel strongly either way, and of course this is your baby :)
Regarding negation of special characters, e.g. [!!]
, I see that as something separate, not related to escape sequence handling (at least, my implementation only considers a single literal character). Also, for the test cases you added, I'm a bit confused, e.g.
[InlineData(@"/foo/bar[!!].baz", @"/foo/bar7.baz")]
Shouldn't this be (which actually passes):
[InlineData(@"/foo/bar[!!].baz", @"/foo/bar[!!].baz")]
Hey, thanks for spending more time thinking on this it's greatly appreciated :-)
I understand what you have said above. Hopefully I can clarify regarding your questions.
So my thinking with /foo/bar[!!].baz is that I wanted to express: don't match on the literal character !. So in other words.. match any other character than ! that follows bar
- i.e it's a negation.:-)
The existing tests you added covered the ability to match on an escaped character, but I could't see any for not matching the same escaped character. I am thinking that this is a valid use case and I appreciate it is an addition / extension to the scope of what you have done so far. So my thinking is either we extend the present escape sequence token parsing that you added to deal with negation, or we fix up char list which already has a notion of negation - and im fine with either approach because really this logic only runs at pattern parse time.. and the most perf sensitive logic is actually pattern match time..ie you would likely only parse a pattern once, and then match against thousands of strings.