Python-Markdown / markdown

A Python implementation of John Gruber’s Markdown with Extension support.
https://python-markdown.github.io/
BSD 3-Clause "New" or "Revised" License
3.77k stars 862 forks source link

Monkeypatching html.parser fails when importing with zipimporter #1132

Open remyroy opened 3 years ago

remyroy commented 3 years ago

When trying to monkeypatch a copy of html.parser, markdown fails when html.parser comes from a zip file and an instance of zipimporter is used as the loader.

To reproduce

  1. Download an embeddable package version of Python like the latest Windows embeddable package (64-bit) (Python 3.9.5)
  2. Unzip it
  3. Copy the markdown package into the python folder
  4. Run python
  5. Import markdown
  6. Notice the following traceback:
$ python
Python 3.9.5 (tags/v3.9.5:0a7dcbd, May  3 2021, 17:27:52) [MSC v.1928 64 bit (AMD64)] on win32
>>> import markdown
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "C:\Users\Rémy Roy\Projects\CDDA-Game-Launcher\build\test\markdown\__init__.py", line 29, in <module>
    from .core import Markdown, markdown, markdownFromFile  # noqa: E402
  File "C:\Users\Rémy Roy\Projects\CDDA-Game-Launcher\build\test\markdown\core.py", line 27, in <module>
    from .preprocessors import build_preprocessors
  File "C:\Users\Rémy Roy\Projects\CDDA-Game-Launcher\build\test\markdown\preprocessors.py", line 29, in <module>
    from .htmlparser import HTMLExtractor
  File "C:\Users\Rémy Roy\Projects\CDDA-Game-Launcher\build\test\markdown\htmlparser.py", line 31, in <module>
    spec.loader.exec_module(htmlparser)
AttributeError: 'zipimporter' object has no attribute 'exec_module'

The main issue is that the zipimporter loader does not have an exec_module method.

remyroy commented 3 years ago

This is related to the work of @waylan in #803

waylan commented 3 years ago

This puts us in a weird jam. We need to monkeypatch the html parser. However, we should never be changing the behavior of the default parser. For example, a user of Python-Markdown could import and run markdown and in the same script import and use an instance of the html parser from the standard lib. In that case, the user should not get our modified behavior. To avoid that, we specifically import the lib in a way that Python sees it as a separate package than the normally imported package. Is there a way to accomplish that which also works with zipimporter? If not, then we can't support zipimporter.

remyroy commented 3 years ago

I'm not familiar enough with the importing internals. My guess is that there should be way to support that with the zipimporter since it's just the same source code in a zip file.

waylan commented 2 years ago

I just did some digging on this issue and it appears that this has been fixed upstream (in Python's zipimport module) in https://bugs.python.org/issue42131 (see the changes in d2e94bb0). It would appear that the changes are available in Python 3.10.

We could close this as an upstream issue. However, that does not address the issue for Python < 3.10. If someone was to provide a patch for that it would be considered. Therefore, I will leave this open for the time being.

Alexey-T commented 2 years ago

@waylan

Unfortunately, I am not aware of any workarounds.

i suggest this workaround. you get patched version of zipimport and put it near the MD module. you rename it a little - eg to zipimport_fixed. you use renamed module instead. use relative import please, from . import zipimport_fixed

@veksha FYI

waylan commented 2 years ago

@Alexey-T we have certainly done that in the past. However, we don't call the zipimport lib at all. However, packages which use embeddable Python do. Therefore, the package which explicitly imports zipimport would need to take the steps you recommend.

By way of explanation, we use the new standard API defined in PEP 451 as found in the importlib module of the Python Standard Library. However, embeddable Python replaces calls to importlib with zipimport, which up until Python 3.10 still only defined the old API from PEP 302. Therefore, we don't have any control over which lib from the standard library is called.

I'm sure there is still some workaround that would address the issue, but it is not immediately clear what that would be and it would likely require extensive testing on a system we don't use (and are not familiar with). Therefore, it is not a high priority issue for us. However, if someone who is familiar with and uses a relevant system were to work up and submit a fix with tests, then we would be willing to review it.