OpenRiak / riak_kv

Riak Key/Value Store
0 stars 0 forks source link

Unnecessary re-compile in riak_kv_wm_object #30

Open martinsumner opened 5 months ago

martinsumner commented 5 months ago

To process a PUT via riak_kv_wm_object, there are three regular expressions compiled:

Links are. a deprecated feature, and the expressions are still compiled even if the links are empty.

Index field splitting is required, but would be more efficiently done with string:split (although there are some subtle differences with output in this case if the input is not a binary).

martinsumner commented 5 months ago

The majority of the time is in compiling the regex not applying it.

string:split(Terms, ", ", all) is marginally quicker than re:split(Terms, ",\s", [{return, binary}]) if Terms is already a binary (otherwise an iolist_to_binary/1 call is required first (And this will be slower).

There may be subtle functional differences between string:split and re:split

martinsumner commented 5 months ago

Overall re split may be better than string:split ...

lists:sum(lists:map(fun(_I) -> element(1, timer:tc(fun() -> re:split(Term, RE0, [{return, binary}]) end)) end, lists:seq(1, 10000))).
13094
61> lists:sum(lists:map(fun(_I) -> element(1, timer:tc(fun() -> re:split(Term, RE0, [{return, binary}]) end)) end, lists:seq(1, 10000))).
13073
62> lists:sum(lists:map(fun(_I) -> element(1, timer:tc(fun() -> re:split(Term, RE0, [{return, binary}]) end)) end, lists:seq(1, 10000))).
13160
63> lists:sum(lists:map(fun(_I) -> element(1, timer:tc(fun() -> re:split(TermList, RE0, [{return, binary}]) end)) end, lists:seq(1, 10000))).
70358
64> lists:sum(lists:map(fun(_I) -> element(1, timer:tc(fun() -> re:split(TermList, RE0, [{return, binary}]) end)) end, lists:seq(1, 10000))).
72131
65> lists:sum(lists:map(fun(_I) -> element(1, timer:tc(fun() -> re:split(TermList, RE0, [{return, binary}]) end)) end, lists:seq(1, 10000))).
70717
66> lists:sum(lists:map(fun(_I) -> element(1, timer:tc(fun() -> re:split(TermListB, RE0, [{return, binary}]) end)) end, lists:seq(1, 10000))).
45961
67> lists:sum(lists:map(fun(_I) -> element(1, timer:tc(fun() -> re:split(TermListB, RE0, [{return, binary}]) end)) end, lists:seq(1, 10000))).
46929
68> lists:sum(lists:map(fun(_I) -> element(1, timer:tc(fun() -> re:split(TermListB, RE0, [{return, binary}]) end)) end, lists:seq(1, 10000))).
47076
69> lists:sum(lists:map(fun(_I) -> element(1, timer:tc(fun() -> string:split(Term, ", ", all) end)) end, lists:seq(1, 10000))).
8858
70> lists:sum(lists:map(fun(_I) -> element(1, timer:tc(fun() -> string:split(Term, ", ", all) end)) end, lists:seq(1, 10000))).
8020
71> lists:sum(lists:map(fun(_I) -> element(1, timer:tc(fun() -> string:split(Term, ", ", all) end)) end, lists:seq(1, 10000))).
8301
72> lists:sum(lists:map(fun(_I) -> element(1, timer:tc(fun() -> string:split(iolist_to_binary(TermList), ", ", all) end)) end, lists:seq(1, 10000))).
92594
73> lists:sum(lists:map(fun(_I) -> element(1, timer:tc(fun() -> string:split(iolist_to_binary(TermList), ", ", all) end)) end, lists:seq(1, 10000))).
82369
74> lists:sum(lists:map(fun(_I) -> element(1, timer:tc(fun() -> string:split(iolist_to_binary(TermList), ", ", all) end)) end, lists:seq(1, 10000))).
95715
75> lists:sum(lists:map(fun(_I) -> element(1, timer:tc(fun() -> string:split(TermListB, ", ", all) end)) end, lists:seq(1, 10000))).
71398
76> lists:sum(lists:map(fun(_I) -> element(1, timer:tc(fun() -> string:split(TermListB, ", ", all) end)) end, lists:seq(1, 10000))).
70223
77> lists:sum(lists:map(fun(_I) -> element(1, timer:tc(fun() -> string:split(TermListB, ", ", all) end)) end, lists:seq(1, 10000))).
70935

Term is a single IndexTerm. TermList is 10 comma/space separated terms. TermListB is TermList but as a binary (i.e. iolist_to_binary(TermList))