facebook / dotslash

Simplified executable deployment
https://dotslash-cli.com
Apache License 2.0
535 stars 14 forks source link

Writing a dotslash file for a python toolchain #22

Open jazzdan opened 3 months ago

jazzdan commented 3 months ago

Given this dotslash file:

#!/usr/bin/env dotslash

{
  "name": "python-standalone",
  "platforms": {
    "macos-aarch64": {
      "size": 26705084,
      "hash": "blake3",
      "digest": "03555c515b0b59c9a8bc15386343228767f3c452c474cddc4cd8949473c30c27",
      "format": "tar.zst",
      "path": "python/install/bin/python",
      "providers": [
        {
          "url": "https://github.com/indygreg/python-build-standalone/releases/download/20240224/cpython-3.11.8+20240224-aarch64-apple-darwin-pgo-full.tar.zst"
        }
      ]
    },
    "macos-x86_64": {
      "size": 26292710,
      "hash": "blake3",
      "digest": "e7a824fdba50916674045b4d64dc07c1d172ec84d438f4cc6ba3c01e39992f56",
      "format": "tar.zst",
      "path": "python/install/bin/python",
      "providers": [
        {
          "url": "https://github.com/indygreg/python-build-standalone/releases/download/20240224/cpython-3.11.8+20240224-x86_64-apple-darwin-pgo-full.tar.zst"
        }
      ]
    },
    "linux-x86_64": {
      "size": 35135207,
      "hash": "blake3",
      "digest": "1edbb8cbde2be264dda8c531c928ff3740a377d8398584dcac7cfeac3b5e190e",
      "format": "tar.zst",
      "path": "python/install/bin/python",
      "providers": [
        {
          "url": "https://github.com/indygreg/python-build-standalone/releases/download/20240224/cpython-3.11.8+20240224-x86_64-unknown-linux-gnu-pgo-full.tar.zst"
        }
      ]
    }
  }
}

I would expect executing it with no arguments to drop me in to a Python REPL. Instead I get this error:

$ ./scripts/bin/python
Could not find platform independent libraries <prefix>
Could not find platform dependent libraries <exec_prefix>
Python path configuration:
  PYTHONHOME = (not set)
  PYTHONPATH = (not set)
  program name = './scripts/bin/python'
  isolated = 0
  environment = 1
  user site = 1
  safe_path = 0
  import site = 1
  is in build tree = 0
  stdlib dir = '/install/lib/python3.11'
  sys._base_executable = '/Users/dan/devel/backend2/scripts/bin/python'
  sys.base_prefix = '/install'
  sys.base_exec_prefix = '/install'
  sys.platlibdir = 'lib'
  sys.executable = '/Users/dan/devel/backend2/scripts/bin/python'
  sys.prefix = '/install'
  sys.exec_prefix = '/install'
  sys.path = [
    '/install/lib/python311.zip',
    '/install/lib/python3.11',
    '/install/lib/python3.11/lib-dynload',
  ]
Fatal Python error: init_fs_encoding: failed to get the Python codec of the filesystem encoding
Python runtime state: core initialized
ModuleNotFoundError: No module named 'encodings'

Current thread 0x00000001e8f13ac0 (most recent call first):
  <no Python frame>

I think this means that python can't find the various libraries that it wants to link against.

If I run this same binary directly from the dotslash cache it works:

~/Library/Caches/dotslash/f0/d51d6feaa418f63e844885ba229db6c8815c74/python/install/bin/python3
Python 3.11.8 (main, Feb 25 2024, 03:37:49) [Clang 17.0.6 ] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>>

For toolchains like this is it necessary to modify them to work with dotslash? I read through #6 but as far as I can tell this python archive should be the first, straightforward case described there. Curious to learn how to handle this. :)

Thanks for open sourcing dotslash! It has made my life a lot easier recently.

bolinfest commented 3 months ago

Out of curiosity, how are you getting/building DotSlash itself?

I just filed https://github.com/facebook/dotslash/issues/23 and I'm curious if it could be related.

jazzdan commented 3 months ago

Hey @bolinfest. I'm definitely not using homebrew in this case. I think I downloaded it directly from the most recent release. I just ensured that I am using the macOS binary listed there and it still fails in the same way:

$ file ~/Downloads/dotslash
/Users/dan/Downloads/dotslash: Mach-O universal binary with 2 architectures: [x86_64:Mach-O 64-bit executable x86_64] [arm64]
/Users/dan/Downloads/dotslash (for architecture x86_64):    Mach-O 64-bit executable x86_64
/Users/dan/Downloads/dotslash (for architecture arm64): Mach-O 64-bit executable arm64
$ /Users/dan/Downloads/dotslash scripts/bin/python
Could not find platform independent libraries <prefix>
Could not find platform dependent libraries <exec_prefix>
Python path configuration:
  PYTHONHOME = (not set)
  PYTHONPATH = (not set)
  program name = 'scripts/bin/python'
  isolated = 0
  environment = 1
  user site = 1
  safe_path = 0
  import site = 1
  is in build tree = 0
  stdlib dir = '/install/lib/python3.11'
  sys._base_executable = '/Users/dan/devel/backend2/scripts/bin/python'
  sys.base_prefix = '/install'
  sys.base_exec_prefix = '/install'
  sys.platlibdir = 'lib'
  sys.executable = '/Users/dan/devel/backend2/scripts/bin/python'
  sys.prefix = '/install'
  sys.exec_prefix = '/install'
  sys.path = [
    '/install/lib/python311.zip',
    '/install/lib/python3.11',
    '/install/lib/python3.11/lib-dynload',
  ]
Fatal Python error: init_fs_encoding: failed to get the Python codec of the filesystem encoding
Python runtime state: core initialized
ModuleNotFoundError: No module named 'encodings'

Current thread 0x00000001f235bac0 (most recent call first):
  <no Python frame>

FWIW it also still fails with the first release of dotslash:

$ /Users/dan/Downloads/dotslash_old scripts/bin/python
Could not find platform independent libraries <prefix>
Could not find platform dependent libraries <exec_prefix>
Python path configuration:
  PYTHONHOME = (not set)
  PYTHONPATH = (not set)
  program name = 'scripts/bin/python'
  isolated = 0
  environment = 1
  user site = 1
  safe_path = 0
  import site = 1
  is in build tree = 0
  stdlib dir = '/install/lib/python3.11'
  sys._base_executable = '/Users/dan/devel/backend2/scripts/bin/python'
  sys.base_prefix = '/install'
  sys.base_exec_prefix = '/install'
  sys.platlibdir = 'lib'
  sys.executable = '/Users/dan/devel/backend2/scripts/bin/python'
  sys.prefix = '/install'
  sys.exec_prefix = '/install'
  sys.path = [
    '/install/lib/python311.zip',
    '/install/lib/python3.11',
    '/install/lib/python3.11/lib-dynload',
  ]
Fatal Python error: init_fs_encoding: failed to get the Python codec of the filesystem encoding
Python runtime state: core initialized
ModuleNotFoundError: No module named 'encodings'

Current thread 0x00000001f235bac0 (most recent call first):
  <no Python frame>
bolinfest commented 3 months ago

@jazzdan Hmm, while we do rewrite arg0 on both Mac and Linux:

https://github.com/facebook/dotslash/blob/92fe07193f3b023081b62754a53cd88aa2cd45c5/src/execution.rs#L110-L111

I wonder if Python is looking for some resource relative to the executable and we're not convincing it sufficiently?

Also, do you still get this error if you try to use it to run a Python file? Or is it just the REPL case?

bolinfest commented 3 months ago

In particular, I'm curious how it is producing that /install path that it seems to be looking through.

jazzdan commented 3 months ago

@bolinfest good question! So --version works just fine for example:

./scripts/bin/python --version
Python 3.11.8

But ./scripts/bin/python helloworld.py does not:

$ echo "print('hello world')" > helloworld.py
$ ./scripts/bin/python helloworld.py
Could not find platform independent libraries <prefix>
Could not find platform dependent libraries <exec_prefix>
Python path configuration:
  PYTHONHOME = (not set)
  PYTHONPATH = (not set)
  program name = './scripts/bin/python'
  isolated = 0
  environment = 1
  user site = 1
  safe_path = 0
  import site = 1
  is in build tree = 0
  stdlib dir = '/install/lib/python3.11'
  sys._base_executable = '/Users/dan/devel/backend2/scripts/bin/python'
  sys.base_prefix = '/install'
  sys.base_exec_prefix = '/install'
  sys.platlibdir = 'lib'
  sys.executable = '/Users/dan/devel/backend2/scripts/bin/python'
  sys.prefix = '/install'
  sys.exec_prefix = '/install'
  sys.path = [
    '/install/lib/python311.zip',
    '/install/lib/python3.11',
    '/install/lib/python3.11/lib-dynload',
  ]
Fatal Python error: init_fs_encoding: failed to get the Python codec of the filesystem encoding
Python runtime state: core initialized
ModuleNotFoundError: No module named 'encodings'

Current thread 0x00000001f235bac0 (most recent call first):
  <no Python frame>

And again, running python from the dotslash cache works fine

$ /Users/dan/Library/Caches/dotslash/70/8af721089c037927052d4af721f1618603ad29/python/install/bin/python helloworld.py
hello world

The file type looks correct for my system (darwin arm64) so I think my dotslash file is set up correctly:

file /Users/dan/Library/Caches/dotslash/70/8af721089c037927052d4af721f1618603ad29/python/install/bin/python
/Users/dan/Library/Caches/dotslash/70/8af721089c037927052d4af721f1618603ad29/python/install/bin/python: Mach-O 64-bit executable arm64

Thanks for taking a look!

bolinfest commented 3 months ago

If this were Linux, I would say we could use strace as a first pass to see what sorts of syscalls/file opens DotSlash/Python are trying to do. Unfortunately [IMO], macOS seems to get more and more locked down such that these sorts of observability tools are extremely difficult to get set up to run :(

jazzdan commented 3 months ago

I hear ya. :) I will try to reproduce this on Linux later and, if that doesn't work, I will disable my System Integrity Protection and see what I can find.

jazzdan commented 3 months ago

This also reproduces on Linux. Here's an strace of executing the dotslash file, and it failing:

https://gist.github.com/jazzdan/f4dd9eb4231a9fd983f85b5056e498b3

Here is an strace of executing the file in the dotslash cache directly, and it succeeding:

https://gist.github.com/jazzdan/71dbd744d680bd14f7e26a434d36fad2

jazzdan commented 3 months ago

It looks like I might be running in to this issue here https://github.com/indygreg/python-build-standalone/issues/57#issuecomment-682319180

the default search path compiled into the binary reflects the build environment instead of the run-time layout.

But I'll admit to being a bit over my head here!

jazzdan commented 3 months ago

In this case running the dotslash file like this seems to fix it:

PYTHONHOME=/home/ubuntu/.cache/dotslash/9c/14429f2885b29b37cd1df0c879aff1862b418a/python/install ./python