dazinator / DotNet.Glob

A fast globbing library for .NET / .NETStandard applications. Outperforms Regex.
MIT License
363 stars 27 forks source link

Single asterisk behaviour #25

Closed cocowalla closed 7 years ago

cocowalla commented 7 years ago

I seem to get odd results when using the single asterisk wildcard *. For example, let's take the following string:

HKEY_LOCAL_MACHINE\SOFTWARE\Adobe\Shockwave 12

The pattern *ave 12 matches, as expected, but *ave*2 (which uses 2 asterisks) does not, neither doe Shock* 12 (which uses a single asterisk, but not as the first character in the pattern).

Is this the expected behaviour? If so, what's the rationale behind it?

dazinator commented 7 years ago

Ok so this is what you should be seeing by design:

Shock* 12 should not match HKEY_LOCAL_MACHINE\SOFTWARE\Adobe\Shockwave 12. *Shock* 12 should not match HKEY_LOCAL_MACHINE\SOFTWARE\Adobe\Shockwave 12. **\Shock* 12 should match HKEY_LOCAL_MACHINE\SOFTWARE\Adobe\Shockwave 12. *ave*2 should not match HKEY_LOCAL_MACHINE\SOFTWARE\Adobe\Shockwave 12. **\ave*2 should match HKEY_LOCAL_MACHINE\SOFTWARE\Adobe\Shockwave 12.

I have added unit tests to verify that the above is correct. The successful matches are: here and failed matches are here

Basically. the pattern is matched against the whole string, starting with the first token. A * will match any number of (including none) any characters within a single segment only.

So * will not match hi\there because * only matches a single segment, so in that example, it would only match the portion: hi\. Where as ** behaves the same as * except it will also match any number of segments. So ** will match hi\there. * can be used anywhere in a pattern. For example *foo*baz will match foobarbaz. ** however, if used in a pattern, it must be the only thing within a segment. You can't have a pattern like this: **foo or baz**\foo. But you could have foo\**\baz is fine and would match foo\bar\baz and would also match foo\bar\bat\baz.

Chances are, if you are not seeing this behaviour, you may be using an older version of dotnet.glob as some stuff around this was fixed in later releases. What version of dotnet.glob are you currently using?

dazinator commented 7 years ago

Am closing this for now, but feel free to comment further if you need more clarification. Might also be worth checking you are using atleast version 1.5.0

cocowalla commented 7 years ago

I'm using the 1.6 preview from 06- May. Will try upgrading to the latest preview layer and see if it still behaves the same.

cocowalla commented 7 years ago

Firstly, thanks for taking the time to write such a detailed description.

* will match any number of (including none) any characters within a single segment only.

Given this, shouldn't Shock* 12 match Shockwave 12 from the last segment of HKEY_LOCAL_MACHINE\SOFTWARE\Adobe\Shockwave 12? (Especially since *ave 12 does - or is this meant to not match?)

As another example, *12 does match, S*12 does not.

dazinator commented 7 years ago

*ave 12 is not meant to match. I think one of the unit tests I pointed to should cover that case.

dazinator commented 7 years ago

Given this, shouldn't Shock* 12 match Shockwave 12 from the last segment of HKEY_LOCAL_MACHINE\SOFTWARE\Adobe\Shockwave 12?

Shock* 12 should match the string Shockwave 12. But it shouldnt match the string HKEY_LOCAL_MACHINE\SOFTWARE\Adobe\Shockwave 12. The pattern Shock* 12 means the string should start with the literal Shock then have 0 or many characters, then have the literal 12. The string being matched starts with H so fails the match straight away. If you want to match the end of a string, and you dont care about the beginning, the pattern should start with **\ as that will match any number (including 0) of any characters, any mumber of segments deep. So **\Shock* 12 should match.

cocowalla commented 7 years ago

OK, I understand now; * is only really useful for matching filenames (or the last path segment for registry keys) if you're matching a path that doesn't contain directory separator characters, since IsMatch matches on the entire subject string. I think what I want to do then is: when glob patterns don't contain any directory separator characters, match on Path.GetFileName(subject), otherwise match on the full path. But I get that isn't standard glob functionality, so I'll implement it in my application's code.

BTW, *ave 12 definately matches in the preview version on NuGet.

dazinator commented 7 years ago

BTW, *ave 12 definately matches in the preview version on NuGet

Hmm ok thanks for letting me know. I'll try and replicate it. So just to be clear, you are saying with the latest prerelease nuget package, the pattern *ave 12 matches that shockwavw key local machine reg key? If thats correct ill see if I can reproduce this.

cocowalla commented 7 years ago

Yes (I just double-checked to be sure)

dazinator commented 7 years ago

Ok - have replicated this bug. Will fix in next release.

dazinator commented 7 years ago

@cocowalla - this is now fixed if you upgrade to: https://www.nuget.org/packages/DotNet.Glob/1.6.0