An unofficial mirror of our repo of the `mwparserfromhtml` package. It is a python library for working with the HTML dumps. Since this is only a mirror, DO NOT PR.
Example: for the en:Cabbage article, the second paragraph of plaintext skipping transclusion is A cabbage generally weighs between . because the HTML is actually <p id="mwHg">A cabbage generally weighs between <span about="#mwt15" typeof="mw:Transclusion" data-mw='{"parts":[{"template":{"target":{"wt":"convert","href":"./Template:Convert"},"params":{"1":{"wt":"500"},"2":{"wt":"to"},"3":{"wt":"1000"},"4":{"wt":"g"},"5":{"wt":"lbs"},"sigfig":{"wt":"1"}},"i":0}}]}' id="mwHw">500 to 1,000 grams (1 to 2</span><span typeof="mw:Entity" about="#mwt15"> </span><span about="#mwt15">lb)</span>. and the wikitext is A cabbage generally weighs between {{convert|500|to|1000|g|lbs|sigfig=1}}.
Maybe we can have an option that only excludes transclusion when it happens inside certain types of elements instead of being the parent element?
In GitLab by @geohci on Aug 30, 2022, 24:21
Example: for the en:Cabbage article, the second paragraph of plaintext skipping transclusion is
A cabbage generally weighs between .
because the HTML is actually<p id="mwHg">A cabbage generally weighs between <span about="#mwt15" typeof="mw:Transclusion" data-mw='{"parts":[{"template":{"target":{"wt":"convert","href":"./Template:Convert"},"params":{"1":{"wt":"500"},"2":{"wt":"to"},"3":{"wt":"1000"},"4":{"wt":"g"},"5":{"wt":"lbs"},"sigfig":{"wt":"1"}},"i":0}}]}' id="mwHw">500 to 1,000 grams (1 to 2</span><span typeof="mw:Entity" about="#mwt15"> </span><span about="#mwt15">lb)</span>.
and the wikitext isA cabbage generally weighs between {{convert|500|to|1000|g|lbs|sigfig=1}}.
Maybe we can have an option that only excludes transclusion when it happens inside certain types of elements instead of being the parent element?