Alir3z4 / html2text

Convert HTML to Markdown-formatted text.
alir3z4.github.io/html2text/
GNU General Public License v3.0
1.79k stars 273 forks source link

Speed up skipwrap function #312

Closed francoisfreitag closed 4 years ago

francoisfreitag commented 4 years ago

Short-circuit evaluation when wrap_links is True (the default). Evaluating a boolean expression is much cheaper than searching a paragraph for a match.

Any link present in the paragraph causes the wrapping to be skipped. Instead of searching for all matches, stop at the first match.


Suggesting the PR after encountering a catastrophic case at work today. html2text took 3800 seconds to generate the plain-text version of a 20 MB HTML document that contained many [] and (). With this patch applied, it processes the input in less than 10 seconds.

coveralls commented 4 years ago

Coverage Status

Coverage remained the same at 97.875% when pulling 68caaf0827024c09a9b513ec6b9eccb846d16063 on francoisfreitag:wraplink into 2d2c7023e6498611e567fb68727ca4628c187b77 on Alir3z4:master.

jdufresne commented 4 years ago

Nice catch. Thanks!