Now I have ran my real grep-like program with the real (test) data through the profiler.
FLRE actually spends three times as many instruction cycles in InitializeFLRE/Unicode (179M), than the entire program needs with Sorokin's lib to read its input, generate its output and exit (62M).
Fixed by replacing(&removing) InitializeUnicode in InitializeFLRE by external FLREBuildUnicode helper tool, which create constant arrays of these data. :)
I was looking at the wrong function in #15 Sry.
Now I have ran my real grep-like program with the real (test) data through the profiler.
FLRE actually spends three times as many instruction cycles in InitializeFLRE/Unicode (179M), than the entire program needs with Sorokin's lib to read its input, generate its output and exit (62M).