dazinator / DotNet.Glob

A fast globbing library for .NET / .NETStandard applications. Outperforms Regex.
MIT License
363 stars 27 forks source link

New option for parsing behaviour of invalid ** #70

Closed dazinator closed 4 years ago

dazinator commented 5 years ago

As per discussion on #68

The pattern **abc should not match abc it should throw an exception by default, This is because as per the read me, if the ** token is used it must be the only content of a segment.

These patterns are a few examples that should all throw:

**abc foo/bar** foo**/ /foo**bar/

Update as result of discussion below: However there are use cases where its desirable to not throw, but instead, parse such invalid usage of double star, to mean two single stars instead. In this situation, instead of throwing when failing to parse an wildcard directory token, it would instead, parse two wildcard tokens (which do not match past current path seperator). However I will make this behaviour "opt in" via setting an option on GlobOptions, as it subverts the behaviour of ** to be interpreted as two single stars, where one single star would suffice, and where using two stars should not otherwise be valid in a glob pattern.

olivier-spinelli commented 5 years ago

This would be annoying: we use DotNet.Glob with patterns that comes directly from the user... The fact that users are bad syntax generator is well known: this would lead to awful try/catch around the Parse...

And even if TryParse avoids the try/catch, what sould we say to the user? "Bad Pattern, please fix it..."?

The way we use the library (user input driven) requires a more Postel's law approach. This may be done with something like "SafeParse" or "AutofixParse". What do you think?

dazinator commented 5 years ago

Yeah I see your point, I was also thinking through this scenario from a user perspective myself, but I veered on the side of handling the exception and showing a message to the user like (invalid ** used in expression etc) so they know they aren't going to get the results they may expect.. but..

I can also see it the other way, where you dont want an exception, I.e you want to search based on a "best endeavour" basis from the user and they aren't interested in knowing their ** wasnt interpreted as a proper directory wildcard token.

I propose to allow dotnet glob to cater for both scenarios through something like:

  1. Add a new option "ParseInvalidDoubleStarAsRepeatingStar" (not great name I know!)
  2. When that option is enabled, when parsing a glob pattern, if a double star is encountered that isnt the only content of a segment e.g /foo**bar then rather than throw an exception, it will treat them as single stars, and therefore parse two single star tokens. Single stars only match up to the next path seperator or end of the string (whichever is first) - so you'd get the same match behaviour if the user had entered a single start there.

By default the setting would be false so you'd get an exception- however in your application, you'd set that option to true. Would this work better for you?

olivier-spinelli commented 5 years ago

Thanks to consider my concern :) One remark: I may have an app in which Globs must be "Parse"d since they are coming from a configuration file, a database (or whatever) and others coming from users that I'd like to "SotfParse". A singleton option will make this scenarion complicated to handle. What about:

?

dazinator commented 5 years ago

Hmm, not sure about that! :-)

A singleton option will make this scenarion complicated to handle.

Only the default options are provided at a global level as a singleton. You can still pass in your own options instance on a per glob basis - so in the scenario you mention, you'd set your default options, then construct a glob with your tailored options for any niche scenarios in your application.

I'd prefer to keep a single parse method, and control its behaviour with options, rather than introduce branches in the code at this time. Would this suffice for your needs?

olivier-spinelli commented 5 years ago

Yes of course, thats's perfectly fine!

StefanBertels commented 4 years ago

My two cents: I don't like global default options. They make it harder to get reliable results -- finally you create side-effects this way. There should be a fixed default and this should be some "strict" mode. Everything else should be tailored locally (when calling / building a specific glob parser) and should clearly be opt-in. Silently or accidentally doing SoftParse things is a bad idea for a general library like this.

I recently started using https://github.com/louthy/language-ext which brings a ton of great (functional programming) features to C#. One of the great features is a parser library that can be used to build custom parsers for any grammar. I use this to get rid of regex.

LanguageExt parsers return a ParserResult which is either the (valid) result or an error object with details. This error object can be used to display a useful message to users. The fun part is that the parser code you write is very modular and fun to write. If someone needs a non-strict grammar he could write his own parser based on pre-built sub-parsers this way.

You even could build a (negative) sub-parser checking for foo** and giving a useful specific message for this case to help users.

Perhaps you do not want to add this dependency to your library, but maybe you get an idea how to go further.

There is a small example here: https://github.com/louthy/language-ext/wiki/Thinking-Functionally:-Combinators (see lower part of the page).

dazinator commented 4 years ago

Hi!

I don't like global default options. They make it harder to get reliable results

I disagree with you here. You can always take control when constructing a glob parser and specify your specific options at that level but Global defaults make it easier to setup consistent application wide behaviour. You dont have to amend the global default options so it is an opt in feature.

There should be a fixed default and this should be some "strict" mode. Everything else should be tailored locally (when calling / building a specific glob parser) and should clearly be opt-in.

The global default options is this. If you want them to be fixed, don't adjust them. Everything else is opt-in, when constructing a glob pass in your own options instance to not rely on application defaults.

For background, the global defaults only have a few options like case sensitivity. If your application runs on a platform with a case sensitive file system and you are primarily globbing against file paths, setting the default case sensitivity options at application level is useful imho.

I'll look into Louthy. However I've written this library to use very low level primitives to keep things optimised. The grammar for globbing is very simplistic. Adopting a parser library seems like it could would most likely be overkill and have potential negative repercussions to the performance, and size of dotnet.glob. With Blazor now in the mix (wasm mode) keeping dependency size down is as important as ever. Unless it can be shown it gives better performance then I don't think I have any reason to adopt this additional dependency.

I'm going to resolve this thread as its more an opinion, but happy to continue the discussion if you have objective examples or objective analysis of specific dotnet.glob code you'd like to go over.

StefanBertels commented 4 years ago

Hi Darell,

thank you for responding my comment.

to be clear: I just gave my two cents, so it's perfectly fine to disagree and of course it's your decision. So there is no need to vindicate anything. I'm thankful that you make this library available as open source.

You're right that dotnet.glob allows all theses usages (opt-in). I just wanted to mention that using some global config/singleton option IMHO might result in code where some people (like me) see disadvantages like side-effects / hidden behavior / possible copy&paste errors. I've seen this with e.g. Newtonsoft.Json and I created things like this in my own code multiple times, too... I think this sometimes results in "bad" (near-sighted) defaults like non-strict behavior (because you can "easily" change it, example). Nowadays I think allowing global/singleton config is mostly an anti-pattern. But: just my 2 cents.

Regarding LanguageExt: You probably won't use it if you care about size. :-) The library is really huge (but worth every byte). Additionally I don't know how fast it is. Performance isn't the main goal of LanguageExt, but Louthy is aware that performance matters.

Kind regards

dazinator commented 4 years ago

Thanks for clarifying and the feedback. I understand what you mean. At the moment this is a considered tradeoff that I am happy with, but I'll monitor the situation and should it become a cause of problems I'll reconsider the design for a future breaking change. I don't foresee that being necessary but you never know! :-)