dazinator / DotNet.Glob

A fast globbing library for .NET / .NETStandard applications. Outperforms Regex.
MIT License
363 stars 27 forks source link

Fixed match: abc** should match abc. #68

Closed Kuinox closed 5 years ago

Kuinox commented 5 years ago

Hi, It try to fix that abc actually does not match with abc but it should. I also changed the wording of the wildcard ** in the readme to give no doubt that ** can match 0** or more directory.

dazinator commented 5 years ago

Thanks for the PR, I'll take a look at this this week ;-)

dazinator commented 5 years ago

Thank you for the PR, however I am not seeing the issue as you see it. The pattern abc** should not match and this is because, as the readme states, if you use ** it should be the only content of a segment (i.e nothing else can appear alongside it except a path seperator. So to me, the issue here is that the parser should probably throw an exception if try and use a pattern like abc**. If you want to match anything starting withabc in 0 or more directories, use something like **/abc*. If you want to match in root directory only, use abc*.

dazinator commented 5 years ago

I'm closing this for now, but if you think I have missed something, I'm happy to continue the discussion.

Kuinox commented 5 years ago

I expected the behavior to be the same than on linux, where ls -l **abc**
find the file named abc in the current directory (so abc** matched with abc) (git have the same behavior, "abc" match a file named "abc" at the root)
Currently `
abcmatch withabc, but notabc**`.

dazinator commented 5 years ago

Ok I understand the issue more clearly now, thank you.

There are many variants of globbing. I try to think carefully about dotnet glob - I don't want to implement something just to make it the same behaviour as an XYZ implementation, unless I am convinced that behaviour is justified and general purpose.

In this case, the example ls -l **abc** - as you say, the double star is only matching within the current directory component. But this is already what a single star does. Therefore I assume ls and git are basically interpreting the ** as two seperate instances of a single * character. So ls -l **abc** can be simplified to ls -l *abc*. I haven't checked this but please let me know if my assumption there is incorrect.

However in dotnet glob, ** has a very specific meaning. It is not evaluated the same as two star characters. Rather it means match 0 or many directory segments. As such it is meant to be used only as the sole content of a segment. **abc should not be valid as far as dotnet glob is concerned. That's where I see the bug in dotnet glob right now - you shouldn't be able to match **abc with abc because it should throw an exception stating ** must be the only content of a segment.

As a seperate issue to that bug, is the question of whether to add support to dotnet glob so that you can use ** with other tokens in a segment. For example abc** or abc/**foo** but given you can also just use a single star in these cases to mean the same thing then adding support for this seems like it:

  1. May encourage you to write needlessly complex glob patterns.
  2. Confuses the meaning of ** to mean two different things in two different contexts. I.e when it's used as the sole content of a segment it takes on the 0 or many directory behavioral, if used with other tokens in a segment it would mean the same as using two single * tokens which simplifies anyway to one single * token.

This may be what git and ls enable but I'm not sold on the value of this as a feature.