glossarist / iev-data

1 stars 1 forks source link

Math conversions are really slow #144

Closed skalee closed 3 years ago

skalee commented 3 years ago

Profiling revealed that TermBuilder#mathml_to_asciimath and TermBuilder#html_to_asciimath which are called from several places take 51% of total concepts generation time (see attachment), mostly due to their use of Nokogiri (41% of total).

I suspect that even simple regular expression test on presence of MathML/HTML tags so that content without math is not processed will make a huge difference. Furthermore, replacing Nokogiri with something else can make a difference too.

Zrzut ekranu 2021-03-13 o 18 38 05

ronaldtse commented 3 years ago

@skalee we have copied of the 'fake math conversion' code to here: https://github.com/metanorma/stepmod-utils/blob/728bd50bf609afd6c7ef0a6848f45a8419a57819/lib/stepmod/utils/html_to_asciimath.rb

And this is probably time to extract out this 'fake math conversion' functionality to a separate gem under the Plurimath umbrella. Can you help with that? Thanks.

skalee commented 3 years ago

@ronaldtse Sure. Please add me to plurimath organization then. How to name that gem? Fake Math? HTML Math? Also, can I assume that things under stepmod-utils are more up to date and feature-complete?

ronaldtse commented 3 years ago

@skalee done. The fake math handling in stepmod-utils may be more up-to-date because there may be some additional issues handled there. Can you confirm @w00lf ?

Maybe we can call the gem "html2math"?

skalee commented 3 years ago

Since I wasn't adding anything in iev-data yet, that's probably true. Stupid question from me.

ronaldtse commented 3 years ago

The only changes were metanorma/stepmod-utils@d8f3e17ac86a8392f6d41653c234fa13f5d8f10f.

skalee commented 3 years ago

Converter should be moved to a separate gem (see: #149). Any further performance improvements should be made there. Processing is already twice as fast thanks to #148.