Open curious-odd-man opened 3 years ago
Hello @spacether!
I've been thinking on this feature you've requested and I would like to get more your input on this.
Do you have some specific use case where you require this feature or is it just an idea?
I have a doubt that I should add this feature.
When I've started this library my idea was to generate matching/not matching texts only with characters that are present in pattern. E.g. For pattern ^abc
only generate abc
text (in contrast to [abc
, abcd
, abcde
, ... ]. By the logic that you've described - I would need to prepend newline character, that is not there in a pattern initially (similar to ^abc
example above, which is not consistent with the initial idea.
Besides, without multiline flag pattern \n^a
cannot match anything.
vs
Currently it is always allowed to put newlines in RgxGen - this means that all generated patterns are always multiline. And if I will implement separate flag for multiline - then I will have to make unnecessary complications for case when multiline is OFF and there are newlines in pattern.
So in general I don't feel like I should add flag for multiline and generate characters that are not in a pattern. To keep consistent with initial idea and to keep things simple overall.
Please let me know what you think and if there are some specific use case for you where you need explicitly prohibit multiline generation and/or allow multiline without mentioning line separator character.
Hey there. Our users have not asked for this feature yet. With time I expect the request to come up. Doesn't your same logic apply to regex parens group matching? Not all of the regex will be the group match so for a(bcd) a is necessary and the group match is bcd?
Given enough time, I expect some users will definitely want to generate multiline = False regexes long term. What do you think? What if we kept the ticket open or closed and only implemented it if there were a certain number of plus 1 emojis on it?
One use case could be string validation of single line input data like first name, last name, address line 1 etc where newline characters should not be allowed. All use cases that I can imagine involve the presence or absence of the newline character because this flag is about multilines.
For the use case that you described - from my understanding users will need to have 2 regexes.
^\w+$
^\W*$
with multiline = true, or\n+^\W*$\n+
- at the current state (no multiline flag support)I believe - the first negative case is not suitable, because it might not contain newlines, and thus probably does not cover all possible cases. On the other hand - second negative case is better, as it explicitly requires having trailing/leading newline characters.
Besides that for proper testing of the case I would have several negative patterns:
This composition of negative patterns has better coverage and easier trackability in case of errors. In any case I don't see how multiline flag fits here or how it can help. Please correct me if I'm wrong :)
As for your query about a(bcd)
pattern. This pattern, same as a(b)cd
and abcd
and any other variation will all produce the same result - only abcd
text. So the group in this pattern does not have any effect at all.
I will keep this ticket open. Probably will implement it some time later. Will assume it is lowest priority for now.
Let me know if this feature will show some demand.
Here is a sample of what could be generated with MULTILINE=True/False For pattern =
^a
When
MULTILINE
isFALSE
Generated values = [
a
]When
MULTILINE
isTRUE
Generated values = [
a
,\na
,\n\na
, ...]Note, however:
For pattern
x$^y
valuex\ny
does not match, when for patternx$\n^y
same value does match.TODO:
m
flag.Feature initially requested by @spacether in #57