fent / randexp.js

Create random strings that match a given regular expression.
http://fent.github.io/randexp.js/
MIT License
1.82k stars 91 forks source link

Cyrillic characters don't work #93

Open husainshabbir opened 4 years ago

husainshabbir commented 4 years ago

Regular expressions with cyrillic characters (e.g. [А-Я]{1,5}[а-я]{5,10}) don't work in the latest version. The last version it used to work in is 0.4.6.

This reproduces the issue: https://codesandbox.io/s/randexp-cyrillic-issue-2kcou

fent commented 4 years ago

The default range for sets includes only printable ASCII characters https://github.com/fent/randexp.js#default-range

you can change it with something like the following

RandExp.prototype.defaultRange.add(0, 65535);

or with instances

let randexp = new RandExp(/regex/);
randexp.defaultRange.add(0, 65535);

defaultRange was added so that the any (.) character set wouldn't generate characters most randexp users wouldn't expect. although, it's applied to all sets, even custom sets (/[a-f]), and negated sets (/[^\D]).

whether or not it's applied to custom sets is debatable, it does seem like unexpected behavior.

michaelficarra commented 4 years ago

I can understand the default range being used for any "open' sets, such as . and negated character classes, but "closed" sets should not be restricted by the default range in my opinion. I would consider the current behaviour a bug.

1valdis commented 4 years ago

IMO this should be left as it is. People only need a minute of time to check the docs to understand what's going on.

It makes no sense if I explicitly specify character range on Randexp and then see that my string does not follow the range I specified. A regular expression may come from anywhere; a Randexp instance is what I control and use and want my generated string to be in range of.

fent commented 4 years ago

I'm leaning towards @michaelficarra in that the default range should be respected for predefined sets, but for custom non-negated sets like in the OP (e.g. [А-Я]{1,5}[а-я]{5,10}), could ignore the default range

1valdis commented 4 years ago

Then why defaultRange is even needed, if some constructs in regexp could "override" it? As it stands, I'm sure that the string generated will have characters in the defined range only, no matter what's in the regexp. So for me this override of range by regexp feels more unintuitive than the OP issue.

michaelficarra commented 4 years ago

@1valdis That's ridiculous. If defaultRange is restricted to a through f, and I provide the regexp x, should it not produce anything? How about [x]? Or [xyz]? Or [x-z]? defaultRange should only affect "open" sets like . or [^a].

1valdis commented 4 years ago

@michaelficarra if it was restricted by someone to a-f then it was done on purpose. For a-f it can be easily found: letters for hexadecimal numbers. If there goes some y or z then it's gonna blow up. The regexp itself is not always something you write into code and control. The randexp.js instance however is. I believe explaining that as "the default range of generated characters applies to whole regexp" is also simpler and more consistent than "the default range applies only to 'open' sets and negated groups, but not for predefined ranges". And I don't understand what's the problem with one line of code randexp.defaultRange.add(0, 65535); if you want Chinese, Russian and others.