curious-odd-man / RgxGen

Regex: generate matching and non matching strings based on regex pattern.
Apache License 2.0
86 stars 14 forks source link

Lookaround does not work when it influences other part of pattern #63

Open Pigeon-Barry opened 3 years ago

Pigeon-Barry commented 3 years ago

There is a general issue with lookaround patterns,

Whenever lookaround pattern part should influence another part of pattern (values that can be produced in another part of pattern) - it does not work correctly.

For example:

(?!B)[AB]

In this pattern lookahead part (?!B) influences [AB] part by limiting number of valid values of [AB] part. This should be supported.

Original request text:

**Describe the bug** A clear and concise description of what the bug is. When using ``` new RgxGen("^((?!(BG|GB|KN|NK|NT|TN|ZZ)|(D|F|I|Q|U|V)[A-Z]|[A-Z](D|F|I|O|Q|U|V))[A-Z]{2})[0-9]{6}[A-D]?$").generate(); ``` the following String is generated '">MO281733' which does not conform to the regular expression. **To Reproduce** Steps to reproduce the behavior: 1. With regex pattern `'^((?!(BG|GB|KN|NK|NT|TN|ZZ)|(D|F|I|Q|U|V)[A-Z]|[A-Z](D|F|I|O|Q|U|V))[A-Z]{2})[0-9]{6}[A-D]?$'` 2. Use code/API - Code 3. See error Invalid String is returned '">MO281733' **Expected behavior** A clear and concise description of what you expected to happen. I expect a string such as 'AA222222D' to be return as this is valid against the regex however this is not the case **Screenshots** If applicable, add screenshots to help explain your problem. **Environment (please complete the following information):** - OS: [e.g. iOS] Windows 10 - JDK/JRE version java version "14.0.1" 2020-04-14 Java(TM) SE Runtime Environment (build 14.0.1+7) Java HotSpot(TM) 64-Bit Server VM (build 14.0.1+7, mixed mode, sharing) - RgxGen Version or commit id ``` com.github.curious-odd-man rgxgen 1.3 ``` **Additional context** Add any other context about the problem here.
curious-odd-man commented 3 years ago

I can partially solve this issue, while throw an exception in cases where I cannot handle lookaround.

My idea is to handle those patterns, where lookaround pattern matches text that is shorter or equal in length for the part which is influenced by this lookaround. For example: (?!BG)[A-Z]{2} the part under negative lookahead is 2 char long and the part that is influenced - [A-Z]{2} is 2 chars long. I can handle it by retrying [A-Z]{2} part unless it satisfies the restriction. The same way I could handle (?!B)[A-Z]{2} or (?!.X)[A-Z]{2}.

Funny enough that I could also handle this pattern image Though that kind of pattern could be hard to handle (?!X+)[A-Z]{2}[CDE]

curious-odd-man commented 3 years ago

Or, really I could go the easy way first - generate text and then verify that it matches with all lookaround things. if not - regenerate, if yes - then give it away to user. Brute-force, but easiest to implement. I can think about performance improvements for special cases later.