Subscriptions expressions: More powerful pattern matching on string-typed properties

mlongob commented 8 months ago

Is there an existing proposal for this?

[X] I have searched the existing proposals

Is your feature request related to a problem?

There is currently a single useful operator for string properties (== and !=). For use-cases that need more powerful pattern matching, the following operations would be useful:

Checking that a string contains another string
Checking that a string starts with another string or a character
Doing pattern matching on a sub-section of an URI (example: /var/log/foobar/*/*.log.)

Describe the solution you'd like

A regex operator would solve many of those problems. We can use the following library to compile regexes without introducing additional dependencies to the broker: https://github.com/bloomberg/bde/blob/d6674e1df52b62d66942ca318882f225d26104bf/groups/bdl/bdlpcre/bdlpcre_regex.h#L669

Alternatives you considered

No response

678098 commented 8 months ago

When we worked on subscriptions, we tried to limit expressions complexity and ensure that it's impossible to build an expression which takes too long to evaluate.

This is why we have constraints like this in the expression evaluator:

https://github.com/bloomberg/blazingmq/blob/cf089077b08916349cbff9cee64da44f8eb329da/src/groups/bmq/bmqeval/bmqeval_simpleevaluator.h#L494-L499

RegEx engine looks like a ready tool to do exactly what we might need to do for a more powerful pattern matching. However, it's usage goes against our initial design and expected usage scenario, because it allows to reduce BlazingMQ performance drastically.

In this article, you can find an example of a short RegEx which takes seconds to evaluate on a short string, and it can be even worse:

https://blog.codinghorror.com/regex-performance/

So, here are some questions to think:

Do we want to lift restrictions on expression complexity?
Do we really need all the features a full RegEx engine provides?
Can we assume that clients will never (intentionally or unintentionally) construct an ineffective RegEx?
Is it possible to use a limited RegEx engine, e.g. disable some features which make possible to overload the engine?
Can we accurately predict that the provided RegEx is inefficient when we first see this RegEx during configuration?
Should we measure expression evaluation time and raise an ALARM if it's too long?
What is the performance of the used RegEx engine in "normal" cases in comparison with already implemented operands in ExpressionEvaluator?
Do we really need something other than * star or . dot support for pattern matching? Is it difficult to implement a fast matching with only these 2?

pniedzielski commented 7 months ago

In this article, you can find an example of a short RegEx which takes seconds to evaluate on a short string, and it can be even worse:

https://blog.codinghorror.com/regex-performance/

Note that this is not a property of regular expressions, but rather a property of backtracking regular expression engines. Strings can be checked against regular expressions (without lookahead assertions) in linear time. PCRE is a backtracking regex matcher, but there are others that don't suffer from this; see for example https://github.com/google/re2/wiki/WhyRE2, designed for similar use cases to ours.

bloomberg / blazingmq