Alir3z4 / html2text

Convert HTML to Markdown-formatted text.
alir3z4.github.io/html2text/
GNU General Public License v3.0
1.74k stars 266 forks source link

Support the superscript and subscript tags #407

Closed cowboysync closed 5 months ago

cowboysync commented 5 months ago

Update init.py to support the superscript and subscript tags

Alir3z4 commented 5 months ago

Thanks for the fix, could you please add some tests?

codecov[bot] commented 5 months ago

Codecov Report

All modified and coverable lines are covered by tests :white_check_mark:

Comparison is base (e375689) 97.23% compared to head (a13f493) 97.26%.

Additional details and impacted files ```diff @@ Coverage Diff @@ ## master #407 +/- ## ========================================== + Coverage 97.23% 97.26% +0.02% ========================================== Files 11 11 Lines 1120 1132 +12 ========================================== + Hits 1089 1101 +12 Misses 31 31 ``` | [Flag](https://app.codecov.io/gh/Alir3z4/html2text/pull/407/flags?src=pr&el=flags&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=Alireza+Savand) | Coverage Δ | | |---|---|---| | [unittests-3.10](https://app.codecov.io/gh/Alir3z4/html2text/pull/407/flags?src=pr&el=flag&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=Alireza+Savand) | `97.26% <100.00%> (+0.02%)` | :arrow_up: | | [unittests-3.11](https://app.codecov.io/gh/Alir3z4/html2text/pull/407/flags?src=pr&el=flag&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=Alireza+Savand) | `97.26% <100.00%> (+0.02%)` | :arrow_up: | | [unittests-3.12](https://app.codecov.io/gh/Alir3z4/html2text/pull/407/flags?src=pr&el=flag&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=Alireza+Savand) | `97.26% <100.00%> (+0.02%)` | :arrow_up: | | [unittests-3.8](https://app.codecov.io/gh/Alir3z4/html2text/pull/407/flags?src=pr&el=flag&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=Alireza+Savand) | `97.26% <100.00%> (+0.02%)` | :arrow_up: | | [unittests-3.9](https://app.codecov.io/gh/Alir3z4/html2text/pull/407/flags?src=pr&el=flag&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=Alireza+Savand) | `97.26% <100.00%> (+0.02%)` | :arrow_up: | Flags with carried forward coverage won't be shown. [Click here](https://docs.codecov.io/docs/carryforward-flags?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=Alireza+Savand#carryforward-flags-in-the-pull-request-comment) to find out more.

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.

Alir3z4 commented 5 months ago

I'd suggest and highly recommend to keep the default output of html2text close to plain-text as possible while being compatible with HTML.

html2text is a Python script that converts a page of HTML into clean, easy-to-read plain ASCII text. Better yet, that ASCII also happens to be valid Markdown (a text-to-HTML format).

While supporting superscript and subscript is a very nice addition, considering above, it would be much better to have the feature as a flag. (worth noting, many markdown parsers or renderers don't happen to have such support )

For instance flag --images-with-size is an example of how to preserve the images with their sizes, but by default it's off.

If we can have such flexibility in this, we can have this merged and include in the next release.

Alir3z4 commented 5 months ago

CI failed to pass the tests.

As the default behavior has sup/sub tags ignored, the way of having test (html->md files) won't work since the test runner will run the HTML2Text with default configuration.

The test need to be changed to a python file like how it's done for newlines on multiple calls. You can make a new file in the tests directory and make a function in it and call the HTML2Text class with the HTML2Text(ignore_sup_sub=False) ... (of course you can delete the current html->md test files).

Alir3z4 commented 5 months ago

Merged via https://github.com/Alir3z4/html2text/pull/408 https://github.com/Alir3z4/html2text/commit/42278c67f5470d65b7bd470b47255dc8b833d3c2

Alir3z4 commented 5 months ago

Thanks for the great contribution. I did some code cleanup to align the code with the rest of the code base and updated the changelog and documentation files.