BeRo1985 / flre

FLRE - Fast Light Regular Expressions - A fast light regular expression library
GNU Lesser General Public License v2.1
94 stars 23 forks source link

Unicode case-insensitiveness #68

Open benibela opened 3 years ago

benibela commented 3 years ago

Unicode has special lower/upper case rules for certain symbols, besides ASCII cases.

For example, these should all find a match (Unicode Kelvin sign 8490 ):


  f := TFLRE.Create('k', [rfIGNORECASE]);
  writeln(f.Find('K'));
  f := TFLRE.Create('K', [rfIGNORECASE]);
  writeln(f.Find('K'));
  f := TFLRE.Create('[a-z]', [rfIGNORECASE]);
  writeln(f.Find('K'));
  f := TFLRE.Create('K', [rfIGNORECASE]);
  writeln(f.Find('k'));
benibela commented 3 years ago

I forgot the [rfUTF8] flag

But the first three still fail with it:

 f := TFLRE.Create('k', [rfIGNORECASE,rfUTF8]);
 writeln(f.Find('K'));
 f := TFLRE.Create('K', [rfIGNORECASE,rfUTF8]);
 writeln(f.Find('K'));
 f := TFLRE.Create('[a-z]', [rfIGNORECASE,rfUTF8]);
 writeln(f.Find('K'));
 f := TFLRE.Create('K', [rfIGNORECASE,rfUTF8]);
 writeln(f.Find('k'));

Also rfIGNORECASE is a bad name, since it collides with sysutils.rfIGNORECASE

Perhaps UTF8Find needs to be used?

 f := TFLRE.Create('k', [rfIGNORECASE,rfUTF8]);
 writeln(f.UTF8Find('K'));
 f := TFLRE.Create('K', [rfIGNORECASE,rfUTF8]);
 writeln(f.UTF8Find('K'));
 f := TFLRE.Create('[a-z]', [rfIGNORECASE,rfUTF8]);
 writeln(f.UTF8Find('K'));
 f := TFLRE.Create('K', [rfIGNORECASE,rfUTF8]);
 writeln(f.UTF8Find('k'));

But that gives the same output