BeRo1985 / flre

FLRE - Fast Light Regular Expressions - A fast light regular expression library
GNU Lesser General Public License v2.1
94 stars 23 forks source link

Stable Unicode Interface #32

Closed benibela closed 6 years ago

benibela commented 8 years ago

Currently, these two functions

function UnicodeGetUpperCaseDeltaFromTable(c:longword):longint; {$ifdef caninline}inline;{$endif}
var Index:longword;
begin
 if c<=$10ffff then begin
  Index:=c shr FLREUnicodeUpperCaseDeltaArrayBlockBits;
  result:=FLREUnicodeUpperCaseDeltaArrayBlockData[FLREUnicodeUpperCaseDeltaArrayIndexBlockData[FLREUnicodeUpperCaseDeltaArrayIndexIndexData[Index shr FLREUnicodeUpperCaseDeltaArrayIndexBlockBits],Index and FLREUnicodeUpperCaseDeltaArrayIndexBlockMask],c and FLREUnicodeUpperCaseDeltaArrayBlockMask];
 end else begin
  result:=0;
 end;
end;

function UnicodeGetLowerCaseDeltaFromTable(c:longword):longint; {$ifdef caninline}inline;{$endif}
var Index:longword;
begin
 if c<=$10ffff then begin
  Index:=c shr FLREUnicodeLowerCaseDeltaArrayBlockBits;
  result:=FLREUnicodeLowerCaseDeltaArrayBlockData[FLREUnicodeLowerCaseDeltaArrayIndexBlockData[FLREUnicodeLowerCaseDeltaArrayIndexIndexData[Index shr FLREUnicodeLowerCaseDeltaArrayIndexBlockBits],Index and FLREUnicodeLowerCaseDeltaArrayIndexBlockMask],c and FLREUnicodeLowerCaseDeltaArrayBlockMask];
 end else begin
  result:=0;
 end;
end;                      

can be used to convert a codepoint to upper (lower) case: codePoint + UnicodeGetUpperCaseDeltaFromTable(codePoint)

But they are not public

Will they exist in future version? Will the table arithmetic continue to work?

Atm I am using theo's utf8tools/utf8proc to convert case, but it is a waste to import a few hundred kb of Unicode tables from his library, when the same tables are also in FLRE

BeRo1985 commented 8 years ago

I did the first step for it now => https://github.com/BeRo1985/pucu

I'll update the FLRE code for PUCU somewhen in this month, but I can't tell yet, when exactly, because I'm working in the moment on two ends, at my primary job at Viprinet and at a ObjectPascal-language-based Unity-style game engine for my secondary job, where I did need also PUCU now (for processing of unicode text files), so that I did it now, and somewhen in this month, there should be also a moment, where I'll need FLRE for the toolchain of the game engine.

The life is sometimes unfair, but without money you can not live unfortunately and so that other less funny things have priority. :)

benibela commented 8 years ago

I would not use the functions, I have my own

https://github.com/benibela/internettools/blob/master/data/bbutils.pas#L724-L749

https://github.com/benibela/internettools/blob/master/data/bbunicodeinfo.pas

https://github.com/benibela/internettools/blob/master/data/bbnormalizeunicode.pas

I just do not like having two copies of similar tables lying around

pyscripter commented 6 years ago

Please close this issue since PUCU is now included.