Open ggydush opened 3 weeks ago
Thanks, we'll take a look.
This is because tabview contains a module docstring and imports from __future__
.
Adding a comment has turned the internal comment into a string.
'''
module docstring
'''
'''
string
'''
from __future__ import ...
I just don't know what to do..
It's actually a very complex problem to solve -- @carljm was looking at it.
I was looking at it, but at this point I have no satisfactory ideas for how to solve it :/
Here's an example of what the shebang rewriting looks like in practice. This kind of technique is called a polyglot script, because it's simultaneously a /bin/sh script and a valid python file.
#!/bin/sh
'''exec' '/home/user/tmp/tool dir/tabview/bin/python' "$0" "$@"
' '''
#!/bin/sh
'''exec' "$(CDPATH= cd -- "$(dirname -- "$0")" && echo "$PWD")"/'python3' "$0" "$@"
' '''
The top of the file before rewriting is of course #!/usr/bin/env python3
. Just putting this in here since it makes it more accessible to see and think about alternatives. cc @paveldikov
Dumping my thoughts from another conversation here: I think patching the actual script is always going to result in frustration, a better option would be to generate a trampoline script that imports the actual code. Something similar is already done on windows where you get a .exe
which invokes a python script.
So for relocatable virtualenvs instead of having bin/tool
be the script, you would generate bin/tool-script
as an executable wrapper that then invokes bin/tool-script.py
(which itself is not marked as executable).
bin/tool
then looks like this:
#!/bin/sh
"exec" "`dirname $0`/python" "$0" "$@"
__file__ = __import__('os').path.join(__import__('os').path.dirname(__file__), "tool-script.py")
exec(compile((lambda f: (f.read(), f.close())[0])(open(__file__)), __file__, 'exec'))
and bin/tool-script.py
is the normal file unchanged (at least for as long as it's a python script).
This results in the following:
__file__
reports tool-script.py
instead of tool
co_filename
reports tool-script.py
instead of tool
__doc__
is correct (needed for libraries such as docopt
)__future__
importsI think this would be the best tradeoff but it comes with the cost of generating a useless second file. The only requirement is sh
which is a standard tool. Because sh
uses exec
it will also not keep a second process running around for nothing.
Like the current solution if someone were to run file
on the wrapper bin/tool
script it would misclassify the file as perl, but since that wrapper is kinda useless that's probably okay. For most editors actually opening up the underlying tool-script.py
file will correctly detect Python which is an improvement over today.
Probably tangential but is it reasonable to rewrite #!/usr/bin/env shebangs at all? The spechttps://packaging.python.org/en/latest/specifications/binary-distribution-format/#recommended-installer-features explicitly mentions #!python and #!pythonw as rewritable placeholders. An argument can be made that anything else amounts to a specific intent on the author's part, and should be left untouched.
(also, did not know uvx used relocatable)
The space in the shebang on GitHub is not actually relevant to this case; the script that appears in the tabview wheel (which is what uv installs from) has a rewritable placeholder shebang. The cause of the issue (and the reason uv encounters this problem while pip does not) is that uv uses relocatable venvs for all cached venvs. The reason for this (I think?) is that uv wants to create the venv in a temporary location and atomically move it to its "content-addressable" location in the cache.
If we were able to avoid using relocatable envs, we would achieve something closer to parity with pip, in that I think a problem would only appear when all of these conditions are met: script has a docstring, script uses __future__
imports, script is installed into a venv whose location is nested deep enough that the resulting absolute shebang is too long for the kernel (>127 bytes on Linux, the platform with the lowest limit.)
Given that we don't need to arbitrarily relocate the env, but just do a one-time move from a temp location to a permanent location (after which the venv is immutable; nothing else will be installed into it), I think we could avoid the need for a truly location-independent shebang by just doing a one-time search-and-replace rewrite of absolute paths in shebangs, before moving the venv into place (or possibly even simpler, write the shebangs in the first place using the intended final destination location of the venv.) I think this would solve the immediate problem with the least risk of causing other undesired side effects. It might still leave us vulnerable to issues with overly-long shebangs, but I think those cases would be rare and pip would have the same problem (and we could still use the extra-wrapper-script solution to help in this case).
If we do need to support truly arbitrarily-relocatable venvs, then I think the best option is to generate an additional wrapper script as proposed by @mitsuhiko, which does carry with it some small risk of further breakage in rare cases, since it changes the __file__
seen by the script. This might e.g. cause the help text displayed by some scripts to no longer show the expected executable name?
Both cases occur here I think, if the user sets UV_TOOL_DIR to a directory with spaces (or I guess if the default has spaces, which is unlikely on posix?) and installs tabview, then it ends up the space in the shebang path case instead.
Yes, I think both the "spaces" case and the "too long" case could occur relatively rarely, but the use of relocatable envs makes this impact everyone who attempts to use tabview (or any other script with a docstring that uses __future__
imports) with uvx. So the impact of relocatable is much broader.
I am not sure if uvx
needs to use true relocatable. Since the activate scripts are not really expected to be used (assumption), and tools are only really supposed to be launched from the uvx
command line (also assumption), maybe it is simplest to simply let the uvx
executable handle the trampolining aspect.
I think we could make it non-relocatable, though the issue will still be present for long shebangs or paths with spaces, right?
Also, FWIW, on a somewhat orthogonal note, the use case for which I implemented relocatable is rather different from uvx
's usage of it. My objective was to enable developers to create redistributable versions of their own projects (that they fully control). So I could steer users towards using {console,gui}-scripts
instead of arbitrary/non-entrypoint scripts (for pure Python that is; binaries are not in scope for this issue anyhow)
I think @mitsuhiko's is probably the best and most idiomatic solution since there is existing precedent for the -script
pattern. It should also generalise nicely beyond uvx
and help with standalone usage of relocatable venvs which is always a good thing.
I think we could make it non-relocatable, though the issue will still be present for long shebangs or paths with spaces, right?
Hmmm, actually it shouldn't be a problem as far as uvx
is concerned. You are still doing the same fundamental exec
trick and therefore bypassing the shebang completely. It's just that, instead of doing the exec
inside of a shell script -- and therefore having to deal with polyglot scripting -- you are doing the exact same thing inside of uvx
's own logic (in Rust).
But yes, this train of thought is entirely predicated on decreasing the range of scenarios affected by the problem, as opposed to actually solving its root cause. You'd fix it for uvx
but standalone uv venv
usage remains affected even if the range of cases where this happens is vastly smaller.
When using
uvx
with script files that havefrom future
import and shebang with a space, I receive the following syntax error:Example:
Link to shebang in tabview, link to uv snippet that causes the syntax error.