Fix reloader for modules run as scripts

bottlepy / bottle

bottle.py is a fast and simple micro-framework for python web-applications.

http://bottlepy.org/

MIT License

8.39k stars 1.46k forks source link

Fix reloader for modules run as scripts #1336

Open lmmarsano opened 3 years ago

lmmarsano commented 3 years ago

Modules run as scripts with python -m flag crash for run(reloader=True), eg,

$ python -m servertestpackage.module wsgiref 8000 --reload
Traceback (most recent call last):
  File "/home/LOM0227/project/bottle/test/servertestpackage/module.py", line 3, in <module>
    import servertestpackage.absolute
ModuleNotFoundError: No module named 'servertestpackage'

bottle.py generates its own command missing -m and using the module's file path to spawn the child process, which breaks the import system. To see the failures, checkout the previous commit in this branch & run the tests.

git checkout @^
python -m unittest test.test_server

Through some guesswork, these commits attempt to reproduce the original command, so that doesn't happen.

I'm unsure this is the best way to get the original command. Is there a better way?

defnull commented 3 years ago

I don't see a bug here that needs fixing. Let me quote https://docs.python.org/3/tutorial/modules.html#intra-package-references

Note that relative imports are based on the name of the current module. Since the name of the main module is always __main__, modules intended for use as the main module of a Python application must always use absolute imports.

So, relative imports are supposed to not work. And if servertestpackage cannot be found as a module, then the import path is not correct? I'm not sure if I understand the problem.

lmmarsano commented 3 years ago

Are you saying that code that runs fine without reloader should crash with reloader? These crashes are caused by the reloader not giving child processes the same command it received to start the main module.

The specs are more authoritative PEP 366, and that bit of advice in the tutorial only applies to unsupported versions of Python. Since then, __package__ was invented to support explicit relative imports for the -m switch.

This will allow relative imports to work correctly from main modules executed with the -m switch.

Moreover, the semantics of -m differ from running a script directly. -m imports the containing package & inserts the current directory (instead of the directory containing the main module/script) into sys.path (PEP 338). Therefore, a correct reloader shouldn't treat loading the main module with or without -m interchangeably. However, https://github.com/bottlepy/bottle/blob/f9b1849db4dd724e36a93a1032be592193fba581/bottle.py#L3671 does: it runs a child process without -m using the main module's file path, resulting in these failures.

defnull commented 3 years ago

Are you saying that code that runs fine without reloader should crash with reloader?

No, I was saying that "I'm not sure if I understand the problem.". Thanks for the clarifying explanation.

So, it looks like python is lying to us when called with -m, because sys.executable and sys.args do not contain the actual executable or arguments. I did not know that. If I understand that correctly, this is not an issue for single-file modules, because python3 -m module and python3 module.py both work as expected. It is only an issue for packages with a __main__.py because: a) if called directly, the __package__ variable is not set and relative imports do not work and b) sys.path is initialized differently.

$ python3 -mpkg
__name__ =  __main__
__package__ =  pkg
sys.path =  ['/tmp', ...]
sys.executable =  /usr/bin/python3
sys.argv =  ['/tmp/pkg/__main__.py']

$ python3 /tmp/pkg/__main__.py
__name__ =  __main__
__package__ =  None
sys.path =  ['/tmp/pkg',  ...]
sys.executable =  /usr/bin/python3
sys.argv =  ['/tmp/pkg/__main__.py']

In short: If we detect a -m invocation (if __name__ == '__main__' and __package__ is not None) then we should replace sys.argv[0] with ["-m", __package__] to get the expected behavior. Correct?

defnull commented 3 years ago

Nah, that would have been too easy. Bottle needs to check __package__ of the __main__ module if it is not itself the main module. So your more complex workaround is really necessary. Hmm.

defnull commented 3 years ago

Sooo... would that work?

args = [sys.executable] + sys.argv
if getattr(sys.modules.get('__main__'), '__package__', None):
    # If a package was loaded with `python -m`, then `sys.argv`
    # is wrong and needs fixing in some cases. See #1336
    args[1:1] = ["-m", sys.modules['__main__'].__package__]

Reasoning: There is always a __main__ module. If its __package__ is None (direct call) or empty (-m with a single-file module) then no fixing is needed. Calling the script directly is fine. If it has a value, replacing python3 /path/to/package/__main__.py with python3 -m package restores the desired behavior.

Edit: This is basically what you did, but a bit shorter. Did I miss some edge-case here?

lmmarsano commented 3 years ago

It is only an issue for packages with a __main__.py

Close: non-package modules are affected, too. That's where I originally encountered the issue. If you checkout the previous commit & run the tests, then you'll see 2 failures:

package reload (ie, script __main__.py)
module reload (script basename matches module name)

I included the package test case for thoroughness.

In the module case, I'd think you need the full name: __package__ would only provide the parent package's.

Even in the single file case, sys.path is affected, and my knowledge isn't exhaustive. If the script doesn't operate on information that depends on the -m switch, then the behavior with or without it is the same.

About the reasoning, I'd try to confirm all assumptions in the specs, and test them even then.

__main__ always exists: Complete Python programs appears to support that
__package__: Import-related module attributes covers the module cases, and PEP 366 covers the filename case ('When the main module is specified by its filename, then the __package__ attribute will be set to None.')
- filename: __package__ is None
- top-level module: __package__ == ''
- package: __package__ is the package's name
- submodule: __package__ is the parent package's name

In light of this, my approach misses the top-level case (-m toplevel). The function I wrote could be replaced with a constant computed at most once. It also feels like a kludge. I'd prefer retrieving the original call: however, Python doesn't seem to offer an obvious way, and I haven't dug too deeply.

defnull commented 3 years ago

It's been a while and I (accidentally) pushed the (broken) partial solution I did a while ago to master now :/

My current best guess for a complete solution is this monster:

def _fixed_argv():
    main = sys.modules.get('__main__')
    package = getattr(main, '__package__', None)
    if package is None: # python3 /path/to/file.py
        return sys.argv

    filepath = getattr(main, '__file__', sys.argv[0])
    module = os.path.splitext(os.path.basename(filepath))[0]
    if module == '__main__': # python3 -m package
        return ["-m", package] + sys.argv[1:]
    if package == '': # python3 -m module
        return ["-m", module] + sys.argv[1:]
    # python3 -m package.module
    return ["-m", package + "." + module] + sys.argv[1:]

Lots of code for an edge case that was never really supported. And even with that, it still breaks if the app changes the working directory (as the test server scripts do). Perhaps we should drop support for -m completely and tell users to either run scripts by path, or use bottle.py --reload package.module or python3 -m bottle --reload package.module. These both work, because bottle is always a runnable file and never a package.