andgineer / TRegExpr

Regular expressions (regex), pascal.
https://regex.sorokin.engineer/en/latest/
MIT License
174 stars 63 forks source link

Proposal: remove OLD unicode support code #209

Closed Alexey-T closed 4 years ago

Alexey-T commented 4 years ago

Old unicode support uses {$IFDEF UnicodeWordDetection}

{$IFDEF UnicodeWordDetection}
  {$IFDEF FPC}
  function IsUnicodeWordChar(AChar: WideChar): boolean; inline;
  var
    NType: byte;
  begin
    if Ord(AChar) >= LOW_SURROGATE_BEGIN then
      Exit(False);
    NType := GetProps(Ord(AChar))^.Category;
    Result := (NType <= UGC_OtherNumber);
  end;
  {$ELSE}
  function IsUnicodeWordChar(AChar: WideChar): boolean; inline;
  begin
    Result := System.Character.IsLetterOrDigit(AChar);
  end;
  {$ENDIF}
{$ENDIF}

and this place uses it too

function TRegExpr.IsWordChar(AChar: REChar): boolean;
.......
  {$IFDEF UnicodeWordDetection}
  if not Result and (Ord(AChar) >= 128) and UseUnicodeWordDetection then
    Result := IsUnicodeWordChar(AChar);
  {$ENDIF}
end;

this is some mess and hard to support!

NEW unicode code uses {$IFDEF FastUnicodeData}:

function TRegExpr.IsWordChar(AChar: REChar): boolean;
begin
  // bit 7 in value: is word char
  Result := CharCategoryArray[Ord(AChar)] and 128 <> 0;
end;

it's easy to support, we use our new unit regexpr_unicodedata. we can change this unit and add more data there.

let's remove old ifdef UnicodeWordDetection and its code? @andgineer

andgineer commented 4 years ago

+

Alexey-T commented 4 years ago

OK, doing it.