Closed wismill closed 4 months ago
Results (Linux x64, 8 × AMD Ryzen 5 2500U, GHC 9.2.7):
All
Unicode.Char.General.Names
name
String: OK (2.61s)
40.4 ms ± 318 μs, 155 MB allocated, 1.6 KB copied, 35 MB peak memory, 36% less than baseline
correctedName
String: OK (0.68s)
44.7 ms ± 744 μs, 155 MB allocated, 1.7 KB copied, 35 MB peak memory, 34% less than baseline
nameOrAlias
String: OK (20.75s)
40.6 ms ± 504 μs, 155 MB allocated, 1.5 KB copied, 35 MB peak memory, 36% less than baseline
nameAliasesByType
String: OK (6.83s)
109 ms ± 1.4 ms, 518 MB allocated, 3.0 KB copied, 35 MB peak memory, 21% less than baseline
nameAliasesWithTypes
String: OK (1.19s)
18.8 ms ± 302 μs, 52 MB allocated, 343 B copied, 35 MB peak memory, 58% less than baseline
nameAliases
String: OK (1.97s)
15.4 ms ± 182 μs, 51 MB allocated, 285 B copied, 35 MB peak memory, 9% less than baseline
How does it compare to ICU?
How does it compare to ICU?
About 5 times faster. However I did not check much the implementation. It does not rely on a reliable library as text-icu
: I added the cbits first to enable testing. Benchmark give us a rough idea; else we had nothing to compare to!
@harendra-kumar Any idea how to allow failure of GHC-head in the CI? Only results I got are ongoing discussions to implement the feature but no temporary fix.
How could we mark it as always succeeding? We should then always check the details.
How could we mark it as always succeeding? We should then always check the details.
Use ignore_error: true
, like here . Note, you will have to add ignore_error
to all other CIs as false.
I updated the CI. We should probably use fail-fast: true
, shouldn’t we?
I updated the CI. We should probably use
fail-fast: true
, shouldn’t we?
It will save on resources, by cancelling all CIs on first error, but it will not show you errors in all CIs.
I added 2 optional APIs: ByteString
and Text
. The ByteString
one is slightly less performant than Text
, which I did not expect. I guess it is good enough for a first version.
Latests benchmarks:
All
Unicode.Char.General.Names
name
String
unicode-data: OK (5.28s)
41.1 ms ± 597 μs, 155 MB allocated, 2.7 KB copied, 35 MB peak memory
icu: OK (1.43s)
204 ms ± 7.3 ms, 524 MB allocated, 8.4 KB copied, 51 MB peak memory, 4.96x
ByteString
unicode-data: OK (0.82s)
26.3 ms ± 496 μs, 68 MB allocated, 1.0 KB copied, 51 MB peak memory
Text
unicode-data: OK (1.55s)
24.4 ms ± 685 μs, 64 MB allocated, 875 B copied, 51 MB peak memory
icu: OK (4.02s)
129 ms ± 2.1 ms, 277 MB allocated, 3.7 KB copied, 51 MB peak memory, 5.30x
correctedName
String
unicode-data: OK (0.38s)
54.7 ms ± 2.1 ms, 151 MB allocated, 3.1 KB copied, 51 MB peak memory
icu: OK (4.76s)
316 ms ± 6.2 ms, 729 MB allocated, 9.6 KB copied, 52 MB peak memory, 5.78x
ByteString
unicode-data: OK (0.95s)
30.4 ms ± 848 μs, 68 MB allocated, 957 B copied, 52 MB peak memory
Text
unicode-data: OK (3.66s)
28.8 ms ± 872 μs, 64 MB allocated, 741 B copied, 52 MB peak memory
icu: OK (1.71s)
245 ms ± 8.9 ms, 475 MB allocated, 6.2 KB copied, 52 MB peak memory, 8.50x
nameAliasesByType
String
unicode-data: OK (1.06s)
149 ms ± 4.9 ms, 514 MB allocated, 6.3 KB copied, 52 MB peak memory
ByteString
unicode-data: OK (2.28s)
150 ms ± 3.7 ms, 516 MB allocated, 5.7 KB copied, 52 MB peak memory
Text
unicode-data: OK (0.45s)
148 ms ± 3.6 ms, 508 MB allocated, 9.3 KB copied, 52 MB peak memory
nameAliasesWithTypes
String
unicode-data: OK (1.50s)
23.7 ms ± 713 μs, 52 MB allocated, 577 B copied, 52 MB peak memory
ByteString
unicode-data: OK (0.36s)
23.3 ms ± 774 μs, 51 MB allocated, 821 B copied, 52 MB peak memory
Text
unicode-data: OK (1.47s)
23.2 ms ± 329 μs, 52 MB allocated, 558 B copied, 52 MB peak memory
nameAliases
String
unicode-data: OK (1.21s)
18.8 ms ± 367 μs, 51 MB allocated, 531 B copied, 52 MB peak memory
ByteString
unicode-data: OK (4.66s)
18.3 ms ± 497 μs, 51 MB allocated, 452 B copied, 52 MB peak memory
Text
unicode-data: OK (2.41s)
18.8 ms ± 721 μs, 51 MB allocated, 460 B copied, 52 MB peak memory
@harendra-kumar I am statisfied with the current state, although there are some points that surprise me:
Text
is faster than ByteString
for name
, nameOrAlias
and correctedName
, but only when requiring templates. ByteString
API allocates much more than Text
on templates.ByteString
< Text
< String
for nameAliases
and nameAliasesWithTypes
(expected), but the 3 perform almost equal for nameAliasesByType
.Anyway, the perf is already good and I spent a good amount of time on this, so if there is no suggestion to improve these, I think we can merge as it is.
Note: this time the benchmarks are run on a smaller set of characters:
name
, nameOrAlias
and correctedName
;So the results cannot be compared with the previous ones.
Latest benchmarks:
All
Unicode.Char.General.Names
name
String
unicode-data: OK (1.37s)
189 ms ± 3.5 ms, 813 MB allocated, 9.3 KB copied, 89 MB peak memory, 23% less than baseline
icu: OK (4.66s)
654 ms ± 9.3 ms, 2.1 GB allocated, 15 KB copied, 119 MB peak memory, 3.46x
ByteString
unicode-data: OK (1.74s)
52.8 ms ± 364 μs, 157 MB allocated, 991 B copied, 119 MB peak memory
Text
unicode-data: OK (1.39s)
41.3 ms ± 1.2 ms, 125 MB allocated, 805 B copied, 120 MB peak memory
icu: OK (6.07s)
194 ms ± 2.1 ms, 380 MB allocated, 2.0 KB copied, 120 MB peak memory, 4.69x
All
Unicode.Char.General.Names
correctedName
String
unicode-data: OK (0.61s)
196 ms ± 4.7 ms, 804 MB allocated, 12 KB copied, 89 MB peak memory, 21% less than baseline
icu: OK (2.35s)
768 ms ± 27 ms, 2.3 GB allocated, 18 KB copied, 112 MB peak memory, 3.92x
ByteString
unicode-data: OK (1.79s)
55.7 ms ± 914 μs, 157 MB allocated, 994 B copied, 112 MB peak memory
Text
unicode-data: OK (0.37s)
43.1 ms ± 1.5 ms, 118 MB allocated, 1.4 KB copied, 119 MB peak memory
icu: OK (0.90s)
294 ms ± 6.3 ms, 568 MB allocated, 5.4 KB copied, 119 MB peak memory, 6.84x
All
Unicode.Char.General.Names
nameOrAlias
String
unicode-data: OK (1.41s)
194 ms ± 3.0 ms, 813 MB allocated, 8.7 KB copied, 89 MB peak memory, 23% less than baseline
ByteString
unicode-data: OK (3.65s)
56.0 ms ± 1.2 ms, 158 MB allocated, 783 B copied, 116 MB peak memory
Text
unicode-data: OK (3.05s)
46.7 ms ± 1.6 ms, 125 MB allocated, 625 B copied, 120 MB peak memory
All
Unicode.Char.General.Names
nameAliasesWithTypes
String
unicode-data: OK (2.28s)
147 ms ± 2.0 ms, 643 MB allocated, 7.6 KB copied, 87 MB peak memory, 16% less than baseline
ByteString
unicode-data: OK (2.16s)
66.6 ms ± 2.3 ms, 248 MB allocated, 1.4 KB copied, 101 MB peak memory
Text
unicode-data: OK (1.52s)
96.7 ms ± 2.1 ms, 286 MB allocated, 1.5 KB copied, 101 MB peak memory
All
Unicode.Char.General.Names
nameAliasesByType
String
unicode-data: OK (1.00s)
134 ms ± 2.8 ms, 541 MB allocated, 6.2 KB copied, 87 MB peak memory, 47% less than baseline
ByteString
unicode-data: OK (2.03s)
131 ms ± 1.3 ms, 541 MB allocated, 5.7 KB copied, 111 MB peak memory
Text
unicode-data: OK (0.42s)
129 ms ± 3.9 ms, 541 MB allocated, 7.2 KB copied, 111 MB peak memory
All
Unicode.Char.General.Names
nameAliases
String
unicode-data: OK (1.89s)
121 ms ± 3.1 ms, 541 MB allocated, 5.8 KB copied, 87 MB peak memory, 11% less than baseline
ByteString
unicode-data: OK (0.63s)
37.5 ms ± 1.4 ms, 146 MB allocated, 1.0 KB copied, 107 MB peak memory
Text
unicode-data: OK (1.04s)
66.4 ms ± 2.1 ms, 186 MB allocated, 1.1 KB copied, 107 MB peak memory
Well, I gave a final try: going low lovel with primops proved to be more efficient than FFI.
The downside is that if primops or ByteString
internals change, we will have to adapt. But:
ByteString
internals we rely on package bounds.These low-level stuff could be applied to Text
API as well, but let’s call it a day.
New benchmark results, which show ByteString
and Text
API now much closer:
All
Unicode.Char.General.Names
name
String
unicode-data: OK (1.31s)
182 ms ± 3.2 ms, 813 MB allocated, 8.5 KB copied, 91 MB peak memory, 27% less than baseline
ByteString
unicode-data: OK (0.69s)
40.7 ms ± 1.0 ms, 127 MB allocated, 922 B copied, 119 MB peak memory
Text
unicode-data: OK (0.64s)
37.5 ms ± 915 μs, 122 MB allocated, 877 B copied, 119 MB peak memory
All
Unicode.Char.General.Names
correctedName
String
unicode-data: OK (2.85s)
186 ms ± 697 μs, 815 MB allocated, 8.3 KB copied, 91 MB peak memory, 28% less than baseline
ByteString
unicode-data: OK (0.72s)
43.5 ms ± 1.4 ms, 127 MB allocated, 920 B copied, 115 MB peak memory
Text
unicode-data: OK (0.70s)
41.8 ms ± 938 μs, 122 MB allocated, 877 B copied, 115 MB peak memory
All
Unicode.Char.General.Names
nameOrAlias
String
unicode-data: OK (1.35s)
187 ms ± 1.4 ms, 813 MB allocated, 8.7 KB copied, 91 MB peak memory, 24% less than baseline
ByteString
unicode-data: OK (0.75s)
47.0 ms ± 944 μs, 127 MB allocated, 925 B copied, 117 MB peak memory
Text
unicode-data: OK (0.74s)
44.0 ms ± 1.4 ms, 122 MB allocated, 877 B copied, 117 MB peak memory
Rebased
We should use text-builder-linear
for Text
and ByteString
. I am going to leave this at the moment, because this repo really needs some love. Will open an issue to not forget about it.
Follow-up of #107.
Added comparison to ICU.