Homebrew / brew

🍺 The missing package manager for macOS (or Linux)
https://brew.sh
BSD 2-Clause "Simplified" License
40.76k stars 9.57k forks source link

Python virtualenv installs no longer reproducible #16012

Open Bo98 opened 12 months ago

Bo98 commented 12 months ago

brew doctor output

Your system is ready to brew.

Verification

brew config output

(from CI that reproduced the issue)

HOMEBREW_VERSION: 4.1.11-25-gc953076
ORIGIN: https://github.com/Homebrew/brew
HEAD: c9530766096e692d5031750a14b4d1d42f1dc655
Last commit: 7 hours ago
Core tap origin: https://github.com/Homebrew/homebrew-core
Core tap HEAD: e2981ddc6460ff77f8543099ac3046c836ef9828
Core tap last commit: 18 minutes ago
Core tap branch: master
Core tap JSON: 13 Sep 20:15 UTC
HOMEBREW_PREFIX: /usr/local
HOMEBREW_BOOTSNAP: set
HOMEBREW_CACHE: /Users/brew/Library/Caches/Homebrew
HOMEBREW_CASK_OPTS: []
HOMEBREW_COLOR: set
HOMEBREW_CURL_PATH: /usr/bin/curl
HOMEBREW_DEVELOPER: set
HOMEBREW_FAIL_LOG_LINES: 150
HOMEBREW_GIT_EMAIL: 1589480+BrewTestBot@users.noreply.github.com
HOMEBREW_GIT_NAME: BrewTestBot
HOMEBREW_GIT_PATH: /usr/bin/git
HOMEBREW_GITHUB_API_TOKEN: set
HOMEBREW_LOGS: /Users/brew/actions-runner/_work/homebrew-core/homebrew-core/logs
HOMEBREW_MAKE_JOBS: 6
HOMEBREW_NO_AUTO_UPDATE: set
HOMEBREW_NO_COLOR: set
HOMEBREW_NO_EMOJI: set
HOMEBREW_NO_ENV_HINTS: set
HOMEBREW_NO_INSTALL_FROM_API: set
HOMEBREW_NO_INSTALLED_DEPENDENTS_CHECK: set
HOMEBREW_SORBET_RUNTIME: set
Homebrew Ruby: 2.6.10 => /System/Library/Frameworks/Ruby.framework/Versions/2.6/usr/bin/ruby
CPU: hexa-core 64-bit penryn
Clang: 15.0.0 build 1500
Git: 2.39.3 => /usr/bin/git
Curl: 8.1.2 => /usr/bin/curl
macOS: 14.0-x86_64
CLT: 15.0.0.0.1.1694021235
Xcode: 15.0

What were you trying to do (and why)?

Produce a reproducible bottle for a package containing a Python virtualenv.

What happened (include all command output)?

The package is no longer reproducible after a recent change:

$ cat $(brew --prefix asciidoc)/libexec/lib/python3.11/site-packages/asciidoc-10.2.0.dist-info/direct_url.json
{"dir_info": {}, "url": "file:///private/tmp/asciidoc-20230913-6340-1d0mrm3/asciidoc-10.2.0"}

What did you expect to happen?

References to the temp directory are not included.

Deleting or modifying this file is not sufficient on its own as it is hashed in RECORD. There's a few alternative solutions discussed in https://github.com/pypa/pip/issues/11424.

It's possible this issue also extends to regular non-venv std_pip_args installs but I have not tested this.

Step-by-step reproduction instructions (by running brew commands)

(Example)
brew install -s asciidoc
MikeMcQuaid commented 11 months ago

Good catch, thanks @Bo98!

github-actions[bot] commented 11 months ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs.

carlocab commented 11 months ago

It's possible this issue also extends to regular non-venv std_pip_args installs but I have not tested this.

I believe this does. CC @branchvincent since I think you may have had a look at this.

cho-m commented 8 months ago

Deleting or modifying this file is not sufficient on its own as it is hashed in RECORD. There's a few alternative solutions discussed in pypa/pip#11424.

From quick glance, it looks like most official options for local directory would require splitting up wheel build and wheel install, e.g.

EDIT: Though I don't know if any quirks with build isolation given most other repository usage is done without isolation using their own packages.

Alternative official option would require removing resources and letting pip fetch tarballs, e.g.


The hack would be to patching up direct_url.json and overwrite hash in RECORD perhaps via a pip shim.

MikeMcQuaid commented 8 months ago

The hack would be to patching up direct_url.json and overwrite hash in RECORD perhaps via a pip shim.

Something like this would work for me 👍🏻

cho-m commented 6 months ago

Trying on asciidoc (https://github.com/Homebrew/homebrew-core/pull/165497), it looks like per platform (e.g. Apple Silicon) the bottle is identical so probably reproducible, but still missing all bottle as it has some recorded sha difference on the bin script:

--- /Users/cho-m/Library/Caches/Homebrew/downloads/0dc87f8a307fb752c8a06de20d603793dcd2d4a2aab76dcbf6085012327a2507--asciidoc--10.2.0.arm64_sonoma.bottle.6.tar.gz
+++ /Users/cho-m/Library/Caches/Homebrew/downloads/581f6cea5fa1a51516e60c91c78ee4cc1545d2e1a8dbd6a6c607cb0bb91cb542--asciidoc--10.2.0.sonoma.bottle.6.tar.gz
│   --- 0dc87f8a307fb752c8a06de20d603793dcd2d4a2aab76dcbf6085012327a2507--asciidoc--10.2.0.arm64_sonoma.bottle.6.tar
├── +++ 581f6cea5fa1a51516e60c91c78ee4cc1545d2e1a8dbd6a6c607cb0bb91cb542--asciidoc--10.2.0.sonoma.bottle.6.tar
│ ├── asciidoc/10.2.0/libexec/lib/python3.12/site-packages/asciidoc-10.2.0.dist-info/RECORD
│ │ @@ -1,9 +1,9 @@
│ │ -../../../bin/a2x,sha256=fUHfht_0WrUAbZS6OYXl2eFJJyRIB7ifIma9OIdKKHQ,248
│ │ -../../../bin/asciidoc,sha256=jM0HujAKip0JkiUIYj8afLlQeIJCUYVUbSS6KFdnU4g,253
│ │ +../../../bin/a2x,sha256=ufhgOjlHOfDxSfLCKkoyteRH4Zz6IXUAyFKDyjSwbBw,245
│ │ +../../../bin/asciidoc,sha256=1ras2JOqlg1UEVWBFctQLiuhwbeLWSLYYdxx1yvzw8A,250

Could be that it is using absolute (i.e. HOMEBREW_PREFIX) path during this metadata generation? EDIT: Probably from shebang.


Also, not a regression, but when Python package builds shared library it stores path into binary. Our old setup_install_args behaves the same way.

Only way around this would be using a stable directory path. I'm guessing we don't do this for /tmp to avoid potential conflict. One option could be source unpack within prefix.

Bo98 commented 6 months ago

Do we do a @@HOMEBREW_PREFIX@@ in those files? That probably explains it as the sha256 will be before that happens.

Also, not a regression, but when Python package builds shared library it stores path into binary

Will check and see where this happens

cho-m commented 6 months ago

Do we do a @@HOMEBREW_PREFIX@@ in those files? That probably explains it as the sha256 will be before that happens.

Yes. Easiest way is probably purging RECORD entries if we want to get an all bottle. Maybe something like:

s.gsub! %r{^(#{Regexp.escape("../../../bin/")}[^/,\n]*),sha256=.*,.*$}, "\\1,,", false

Also, not a regression, but when Python package builds shared library it stores path into binary

Will check and see where this happens

At least I was getting difference locally on back-to-back builds of cffi and running brew bottle --verbose --json --only-json-tab cffi.

I wanted to check if CI behaves the same but our bottle cache skips the 2nd build.

│ ├── cffi/1.16.0_1/lib/python3.12/site-packages/_cffi_backend.cpython-312-darwin.so
│ │ ├── arm64
│ │ │┄ Format-specific differences are supported for this file format but no file-specific differences were detected; falling back to a binary diff. file(1) reports: Mach-O 64-bit arm64 bundle, flags:<NOUNDEFS|DYLDLINK|TWOLEVEL|HAS_TLV_DESCRIPTORS>
│ │ │ @@ -13162,21 +13162,21 @@
│ │ │  00033690: 6765 745f 696e 7465 7270 7374 6174 655f  get_interpstate_
│ │ │  000336a0: 6469 6374 2e61 7474 725f 6e61 6d65 005f  dict.attr_name._
│ │ │  000336b0: 6666 695f 696e 7465 726e 616c 5f6e 6577  ffi_internal_new
│ │ │  000336c0: 2e69 6e74 6572 6e61 6c5f 6f75 7470 7574  .internal_output
│ │ │  000336d0: 005f 5f74 6573 7466 756e 6336 2e79 005f  .__testfunc6.y._
│ │ │  000336e0: 5f4d 6572 6765 6447 6c6f 6261 6c73 2e36  _MergedGlobals.6
│ │ │  000336f0: 3737 002f 7072 6976 6174 652f 746d 702f  77./private/tmp/
│ │ │ -00033700: 6366 6669 2d32 3032 3430 3330 382d 3331  cffi-20240308-31
│ │ │ -00033710: 3933 342d 3937 7668 3475 2f63 6666 692d  934-97vh4u/cffi-
│ │ │ +00033700: 6366 6669 2d32 3032 3430 3330 382d 3333  cffi-20240308-33
│ │ │ +00033710: 3133 392d 776c 3473 3665 2f63 6666 692d  139-wl4s6e/cffi-
...
│ ├── cffi/1.16.0_1/lib/python3.12/site-packages/cffi-1.16.0.dist-info/RECORD
│ │ @@ -1,8 +1,8 @@
│ │ -_cffi_backend.cpython-312-darwin.so,sha256=cY4QovyRC4Z0fRE8WkxBI9Ldf8iWUxXCP4GdsJxZ2Sc,212688
│ │ +_cffi_backend.cpython-312-darwin.so,sha256=GfMRY0xdUvDfgzqy_kNQvayCwPdXXlN3pXGhSyuJjts,212688

I get the same bottle when using libexec, e.g.

@@ -31,9 +31,11 @@ class Cffi < Formula
   end

   def install
+    libexec.install buildpath.children
     pythons.each do |python|
-      system python, "-m", "pip", "install", *std_pip_args, "."
+      system python, "-m", "pip", "install", *std_pip_args, libexec
     end
+    libexec.rmtree
   end

   test do
Bo98 commented 6 months ago

Ok that .c reference will be a compiler thing. I know how to fix that: it's -ffile-prefix-map=#{buildpath}=/tmp/homebrew-build/cffi. I'll look into integrating this into brew somehow.

It only works on GCC >= 8, LLVM Clang >= 10 and Apple Clang >= 12, though -fdebug-prefix-map will partially cover older versions.

Bo98 commented 6 months ago

Oh right didn't realise there was two paths. What I said covers the .c one.

cho-m commented 6 months ago

Oh right didn't realise there was two paths. What I said covers the .c one.

The other path is just for checksum of first path. Diff should go away once .c path is fixed.

EDIT: Or are you referring to another path embedded in .so itself? Didn't really check full diff there.

cho-m commented 6 months ago

I see what you mean. The /private/tmp/cffi-*/cffi-1.16.0/build/temp.macosx-14.0-arm64-cpython-312/src/c/_cffi_backend.o

Bo98 commented 6 months ago

Please try with https://github.com/Homebrew/brew/pull/16860.

Bo98 commented 6 months ago

I don't necessarily expect same checksums across different compiler/linker versions so we won't get an all bottle there, but I would at least expect a repeated build with the same environment to produce the same result.

cho-m commented 6 months ago

Please try with #16860.

Works for cffi example.

Do we know if there are any potential scenarios this may need to be disabled? At least Debian has enabled -ffile-prefix-map as default so there is good chance of stability for -fdebug-prefix-map

Ref: https://git.dpkg.org/git/dpkg/dpkg.git/commit/?id=b60c243ba99b8483202a6f6a814476275204fdff


I don't necessarily expect same checksums across different compiler/linker versions so we won't get an all bottle there, but I would at least expect a repeated build with the same environment to produce the same result.

That's what I expect. The all bottle discussion was for pure python code, which should be possible like asciidoc. Though larger virtualenvs probably will have some C/C++ code making all bottles unlikely.

Bo98 commented 6 months ago

Do we know if there are any potential scenarios this may need to be disabled? At least Debian has enabled -ffile-prefix-map as default so there is good chance of stability for -fdebug-prefix-map

It should be at least safe enough to not worry about it until there's evidence that something breaks.

I don't envision any particular issue

cho-m commented 2 weeks ago

Is there anything left here?

Only thing left I can think of is doing version detection to enable -ffile-prefix-map. There are probably some formulae this will help (e.g. I think csound is impacted, though it is also impacted by a CMake JAR reproducibility issue).