astral-sh / uv

An extremely fast Python package and project manager, written in Rust.
https://docs.astral.sh/uv
Apache License 2.0
20.01k stars 594 forks source link

Scripts that use "from future" can result in syntax error when used with uvx #6489

Open ggydush opened 3 weeks ago

ggydush commented 3 weeks ago

When using uvx with script files that have from future import and shebang with a space, I receive the following syntax error:

    from __future__ import print_function, unicode_literals
    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
SyntaxError: from __future__ imports must occur at the beginning of the file

Example:

uvx tabview

Link to shebang in tabview, link to uv snippet that causes the syntax error.

charliermarsh commented 3 weeks ago

Thanks, we'll take a look.

yk-kd commented 3 weeks ago

This is because tabview contains a module docstring and imports from __future__. Adding a comment has turned the internal comment into a string.

'''
module docstring
'''

'''
string 
'''
from __future__ import ...

I just don't know what to do..

charliermarsh commented 3 weeks ago

It's actually a very complex problem to solve -- @carljm was looking at it.

carljm commented 3 weeks ago

I was looking at it, but at this point I have no satisfactory ideas for how to solve it :/

bluss commented 3 weeks ago

Here's an example of what the shebang rewriting looks like in practice. This kind of technique is called a polyglot script, because it's simultaneously a /bin/sh script and a valid python file.

  1. Due to space in path
#!/bin/sh
'''exec' '/home/user/tmp/tool dir/tabview/bin/python' "$0" "$@"
' '''
  1. Due to venv setting relocatable
#!/bin/sh
'''exec' "$(CDPATH= cd -- "$(dirname -- "$0")" && echo "$PWD")"/'python3' "$0" "$@"
' '''

The top of the file before rewriting is of course #!/usr/bin/env python3. Just putting this in here since it makes it more accessible to see and think about alternatives. cc @paveldikov

mitsuhiko commented 3 weeks ago

Dumping my thoughts from another conversation here: I think patching the actual script is always going to result in frustration, a better option would be to generate a trampoline script that imports the actual code. Something similar is already done on windows where you get a .exe which invokes a python script.

So for relocatable virtualenvs instead of having bin/tool be the script, you would generate bin/tool-script as an executable wrapper that then invokes bin/tool-script.py (which itself is not marked as executable).

bin/tool then looks like this:

#!/bin/sh
"exec" "`dirname $0`/python" "$0" "$@"
__file__ = __import__('os').path.join(__import__('os').path.dirname(__file__), "tool-script.py")
exec(compile((lambda f: (f.read(), f.close())[0])(open(__file__)), __file__, 'exec'))

and bin/tool-script.py is the normal file unchanged (at least for as long as it's a python script).

This results in the following:

I think this would be the best tradeoff but it comes with the cost of generating a useless second file. The only requirement is sh which is a standard tool. Because sh uses exec it will also not keep a second process running around for nothing.

Like the current solution if someone were to run file on the wrapper bin/tool script it would misclassify the file as perl, but since that wrapper is kinda useless that's probably okay. For most editors actually opening up the underlying tool-script.py file will correctly detect Python which is an improvement over today.

paveldikov commented 3 weeks ago

Probably tangential but is it reasonable to rewrite #!/usr/bin/env shebangs at all? The spechttps://packaging.python.org/en/latest/specifications/binary-distribution-format/#recommended-installer-features explicitly mentions #!python and #!pythonw as rewritable placeholders. An argument can be made that anything else amounts to a specific intent on the author's part, and should be left untouched.

(also, did not know uvx used relocatable)

carljm commented 2 weeks ago

The space in the shebang on GitHub is not actually relevant to this case; the script that appears in the tabview wheel (which is what uv installs from) has a rewritable placeholder shebang. The cause of the issue (and the reason uv encounters this problem while pip does not) is that uv uses relocatable venvs for all cached venvs. The reason for this (I think?) is that uv wants to create the venv in a temporary location and atomically move it to its "content-addressable" location in the cache.

If we were able to avoid using relocatable envs, we would achieve something closer to parity with pip, in that I think a problem would only appear when all of these conditions are met: script has a docstring, script uses __future__ imports, script is installed into a venv whose location is nested deep enough that the resulting absolute shebang is too long for the kernel (>127 bytes on Linux, the platform with the lowest limit.)

Given that we don't need to arbitrarily relocate the env, but just do a one-time move from a temp location to a permanent location (after which the venv is immutable; nothing else will be installed into it), I think we could avoid the need for a truly location-independent shebang by just doing a one-time search-and-replace rewrite of absolute paths in shebangs, before moving the venv into place (or possibly even simpler, write the shebangs in the first place using the intended final destination location of the venv.) I think this would solve the immediate problem with the least risk of causing other undesired side effects. It might still leave us vulnerable to issues with overly-long shebangs, but I think those cases would be rare and pip would have the same problem (and we could still use the extra-wrapper-script solution to help in this case).

If we do need to support truly arbitrarily-relocatable venvs, then I think the best option is to generate an additional wrapper script as proposed by @mitsuhiko, which does carry with it some small risk of further breakage in rare cases, since it changes the __file__ seen by the script. This might e.g. cause the help text displayed by some scripts to no longer show the expected executable name?

bluss commented 2 weeks ago

Both cases occur here I think, if the user sets UV_TOOL_DIR to a directory with spaces (or I guess if the default has spaces, which is unlikely on posix?) and installs tabview, then it ends up the space in the shebang path case instead.

carljm commented 2 weeks ago

Yes, I think both the "spaces" case and the "too long" case could occur relatively rarely, but the use of relocatable envs makes this impact everyone who attempts to use tabview (or any other script with a docstring that uses __future__ imports) with uvx. So the impact of relocatable is much broader.

paveldikov commented 2 weeks ago

I am not sure if uvx needs to use true relocatable. Since the activate scripts are not really expected to be used (assumption), and tools are only really supposed to be launched from the uvx command line (also assumption), maybe it is simplest to simply let the uvx executable handle the trampolining aspect.

charliermarsh commented 2 weeks ago

I think we could make it non-relocatable, though the issue will still be present for long shebangs or paths with spaces, right?

paveldikov commented 2 weeks ago

Also, FWIW, on a somewhat orthogonal note, the use case for which I implemented relocatable is rather different from uvx's usage of it. My objective was to enable developers to create redistributable versions of their own projects (that they fully control). So I could steer users towards using {console,gui}-scripts instead of arbitrary/non-entrypoint scripts (for pure Python that is; binaries are not in scope for this issue anyhow)

I think @mitsuhiko's is probably the best and most idiomatic solution since there is existing precedent for the -script pattern. It should also generalise nicely beyond uvx and help with standalone usage of relocatable venvs which is always a good thing.

paveldikov commented 2 weeks ago

I think we could make it non-relocatable, though the issue will still be present for long shebangs or paths with spaces, right?

Hmmm, actually it shouldn't be a problem as far as uvx is concerned. You are still doing the same fundamental exec trick and therefore bypassing the shebang completely. It's just that, instead of doing the exec inside of a shell script -- and therefore having to deal with polyglot scripting -- you are doing the exact same thing inside of uvx's own logic (in Rust).

But yes, this train of thought is entirely predicated on decreasing the range of scenarios affected by the problem, as opposed to actually solving its root cause. You'd fix it for uvx but standalone uv venv usage remains affected even if the range of cases where this happens is vastly smaller.