Alir3z4 / html2text

Convert HTML to Markdown-formatted text.
alir3z4.github.io/html2text/
GNU General Public License v3.0
1.81k stars 272 forks source link

`None` link title assert hit #326

Open equaeghe opened 4 years ago

equaeghe commented 4 years ago

There is the following in the code:

https://github.com/Alir3z4/html2text/blob/24aac8629f6a50ab262727815107384e0682cf42/html2text/__init__.py#L781

Of course someone managed to produce html with links of the form <a title href="example.org">…</a> that trigger this assert. I think better than having the assert would be to replace

https://github.com/Alir3z4/html2text/blob/24aac8629f6a50ab262727815107384e0682cf42/html2text/__init__.py#L780

with

link_attrs_title = link.attrs.get("title", None)
if link_attrs_title is not None:

N.B.: The same type of pattern fixable in the way can be found on three other occasions:

Alir3z4 commented 4 years ago

Thanks for pointing these out.

Would like to know what @jdufresne thinks on this as well.

@equaeghe A pull request that fixes these is welcome.

equaeghe commented 4 years ago

@equaeghe A pull request that fixes these is welcome.

OK, see https://github.com/Alir3z4/html2text/pull/327. It seems there were some problems detected by the CI, but it wasn't really clear to me what the issue was. I fixed some style issues that were found, getting one extra test to be successful. But for the others, I'll need guidance. (Are they even all due to my changes?)

equaeghe commented 4 years ago

Did you have time to look at my pull request. I again hit an issue fixed by it. Namely, someone managed to put a naked alt in <img … alt …>, which causes alt=None, which causes an exception, because alt is assumed to be a str or bytes.

jdufresne commented 4 years ago

Would like to know what @jdufresne thinks on this as well.

The general idea makes sense to me once the CI issues are hammered out.

equaeghe commented 4 years ago

The general idea makes sense to me once the CI issues are hammered out.

I'll have a look at them again. I think I now understand better how the CI works and where to find the errors.

equaeghe commented 4 years ago

I'll have a look at them again. I think I now understand better how the CI works and where to find the errors.

I managed to eliminate all but one of the issues:

https://travis-ci.org/github/Alir3z4/html2text/jobs/726262620

I need assistance with this one @Alir3z4 @jdufresne .

equaeghe commented 3 years ago

I tried again, but am still hitting an issue I can't seem to solve:

https://travis-ci.org/github/Alir3z4/html2text/jobs/737085613

I really need assistance with this one @Alir3z4 @jdufresne .

Alir3z4 commented 3 years ago

Sorry, I missed this

The error is coming from mypy

html2text/__init__.py:540: error: Incompatible types in assignment (expression has type "Optional[str]", variable has type "str")
Found 1 error in 1 file (checked 7 source files)
ERROR: InvocationError for command /home/travis/build/Alir3z4/html2text/.tox/mypy/bin/mypy --strict html2text (exited with code 1)

Seems to be type issues on

html2text/__init__.py:540: error: Incompatible types in assignment (expression has type "Optional[str]", variable has type "str")
equaeghe commented 3 years ago

Seems to be type issues on

html2text/__init__.py:540: error: Incompatible types in assignment (expression has type "Optional[str]", variable has type "str")

Ah, now I get it! Mypy does not infer that attrs.get("href", "") cannot be None. I changed it to attrs.get("href") or "" to convince Mypy. (Bit of a shame, the former seems a more proper approach.) Now that test is passed as well. There's just the inclomplete coverage, but I have the impression that is not due to my patch.

Can you check if you can merge the pull request?

equaeghe commented 3 years ago

@jdufresne , @Alir3z4: Bump.