Closed dschuhmacher closed 5 months ago
It appears to me that the best solution is making the parsed svg path part of the kanjivec format rather than the original d-string. This avoids repeated parsing and removes the 70ms completely from kanjidist
.
Implemented in kanjistat v0.13.0
profiling
kanjidistmat
computation of 壇 vs 増,垣,槽 at seg_depth 4 with approx="pcweighted" and density=30 reveals that 380 out of 430ms are spent incomponent_cost
. Of these 200ms are spent inunbalanced
(there is nothing we can do about that), but 70ms are spent inparse_svg_path
(which seems a lot), hardly anything else inpoints_from_svg
(which seems surprising), but still 110ms in the rest ofcomponent_cost
.A larger comparison of completely random kanji gives similar percentages (but allots the full runtime to
component_cost
and only about 10 percent toparse_svg_path
(probably due to fewer strokes in the kanji).It is not that much that can be gained. But it seems improving somewhat on
parse_svg_path
should be easily(?) possible (at most 10/70ms are in thegsub
) by using more basic string operations (packagestringi
??) and maybe implementing it in C++. Also the percentage of time taken "by the other commands" (except forunbalanced
andpoints_from_svg
) incomponent_cost
seems rather too large.