Homebrew / homebrew-core

🍻 Default formulae for the missing package manager for macOS (or Linux)
https://brew.sh
BSD 2-Clause "Simplified" License
13.53k stars 12.28k forks source link

AVX instructions in Big Sur bottles crash when run in Rosetta 2 #67713

Closed Tyrubias closed 3 years ago

Tyrubias commented 3 years ago

Bug report

Please note we will close your issue without comment if you delete, do not read or do not fill out the issue checklist below and provide ALL the requested information. If you repeatedly fail to use the issue template, we will block you from ever submitting issues to Homebrew again.

What you were trying to do (and why)

I was trying to run python3 from the formula python@3.9 on my M1 MacBook Air under Rosetta 2.

What happened (include command output)

When trying to run the interactive REPL or any Python program, the interpreter crashes with the error illegal hardware instruction. Logs can be found here.

Command output

  [1]    2892 illegal hardware instruction  python3

  

What you expected to happen

Python 3 should run under Rosetta 2

Step-by-step reproduction instructions (by running brew install commands)

$ brew install python@3.9
$ python3

Of note is that fact that postinstall fails as well.

mitchblank commented 3 years ago

Seems M1-specific? I just rebuilt 3.9.1_2 on an Intel Big Sur machine and it doesn't crash.

Tyrubias commented 3 years ago

@mitchblank Yeah, I think it's M1 specific. I should've made that more clear in the issue.

mologie commented 3 years ago

Confirming this happens on my M1 too. It dies when it hits an AVX instruction:

$ lldb --arch x86_64 -- /usr/local/Cellar/python@3.9/3.9.1_2/bin/python3 -m ensurepip 
(lldb) target create --arch=x86_64 "/usr/local/Cellar/python@3.9/3.9.1_2/bin/python3"
Current executable set to '/usr/local/Cellar/python@3.9/3.9.1_2/bin/python3' (x86_64).
(lldb) settings set -- target.run-args  "-m" "ensurepip"
(lldb) run
Process 42289 launched: '/usr/local/Cellar/python@3.9/3.9.1_2/bin/python3' (x86_64)
Process 42289 stopped
* thread #1, queue = 'com.apple.main-thread', stop reason = EXC_BAD_INSTRUCTION (code=EXC_I386_INVOP, subcode=0x0)
    frame #0: 0x0000000100003b4b python3`___lldb_unnamed_symbol1$$python3 + 126
python3`___lldb_unnamed_symbol1$$python3:
->  0x100003b4b <+126>: vmovups 0x3c4(%rip), %xmm0        ; ".app/Contents/MacOS/Python"
    0x100003b53 <+134>: vmovups %xmm0, 0x10(%rcx)
    0x100003b58 <+139>: vmovups 0x3a7(%rip), %xmm0        ; "Resources/Python.app/Contents/MacOS/Python"
    0x100003b60 <+147>: vmovups %xmm0, (%rcx)
Target 0: (python3) stopped.

Rosetta does not support AVX (and does not advertise it either). So this Python build seems to lack proper CPU feature checking and a fallback to non-AVX instructions. This might also be a bug in Rosetta, but I do not see any feature checks in the whole frame so it's probably just a broken build.

carlocab commented 3 years ago

Is this python@3.9 poured from the bottle? Does it still happen when you rebuild from source (still in Rosetta, of course)?

lutzroeder commented 3 years ago

Repro and logs on M1 machine, started failing after recent update. /usr/local/opt/python/libexec/bin/python no longer exists and /usr/local/Cellar/python@3.9/3.9.1_2/bin/python3.9 gives zsh: illegal hardware instruction.

~: arch -x86_64 brew install python
Updating Homebrew...
==> Auto-updated Homebrew!
Updated 1 tap (homebrew/core).
==> Updated Formulae
Updated 10 formulae.

==> Downloading https://homebrew.bintray.com/bottles/python%403.9-3.9.1_2.big_sur.bottle.tar.gz
Already downloaded: ~/Library/Caches/Homebrew/downloads/36358b83a61f27928eeece29b009118a210747ad65d96c15eaa8c8d78d39dcd6--python@3.9-3.9.1_2.big_sur.bottle.tar.gz
==> Pouring python@3.9-3.9.1_2.big_sur.bottle.tar.gz
Error: The `brew link` step did not complete successfully
The formula built, but is not symlinked into /usr/local
Could not symlink bin/easy_install-3.9
Target /usr/local/bin/easy_install-3.9
already exists. You may want to remove it:
  rm '/usr/local/bin/easy_install-3.9'

To force the link and overwrite all conflicting files:
  brew link --overwrite python@3.9

To list all files that would be deleted:
  brew link --overwrite --dry-run python@3.9

Possible conflicting files are:
/usr/local/bin/easy_install-3.9
/usr/local/bin/pip3
/usr/local/bin/pip3.9
==> /usr/local/Cellar/python@3.9/3.9.1_2/bin/python3 -m ensurepip
Last 15 lines from ~/Library/Logs/Homebrew/python@3.9/post_install.01.python3:
2020-12-25 17:49:56 -0800

/usr/local/Cellar/python@3.9/3.9.1_2/bin/python3
-m
ensurepip

Warning: The post-install step did not complete successfully
You can try again using `brew postinstall python@3.9`
==> Caveats
Python has been installed as
  /usr/local/bin/python3

Unversioned symlinks `python`, `python-config`, `pip` etc. pointing to
`python3`, `python3-config`, `pip3` etc., respectively, have been installed into
  /usr/local/opt/python@3.9/libexec/bin

You can install Python packages with
  pip3 install <package>
They will install into the site-package directory
  /usr/local/lib/python3.9/site-packages

See: https://docs.brew.sh/Homebrew-and-Python
==> Summary
🍺  /usr/local/Cellar/python@3.9/3.9.1_2: 3,670 files, 60.7MB
~: /usr/local/opt/python/libexec/bin/python
zsh: no such file or directory: /usr/local/opt/python/libexec/bin/python
~: /usr/local/Cellar/python@3.9/3.9.1_2/bin/python3.9
zsh: illegal hardware instruction  /usr/local/Cellar/python@3.9/3.9.1_2/bin/python3.9

@fxcoudert

carlocab commented 3 years ago

What happens when you do arch -x86_64 brew install -s python?

Tyrubias commented 3 years ago

@carlocab The build fails because of IPv6 errors

checking for dup2... yes
checking for strdup... yes
checking for getpgrp... yes
checking for setpgrp... (cached) yes
checking for library containing crypt... none required
checking for library containing crypt_r... no
checking for crypt_r... no
checking for clock_gettime... yes
checking for clock_getres... yes
checking for clock_settime... yes
checking for major... yes
checking for getaddrinfo... yes
checking getaddrinfo bug... yes
Fatal: You must get working getaddrinfo() function.
       or you can specify "--disable-ipv6".

READ THIS: https://docs.brew.sh/Troubleshooting
lutzroeder commented 3 years ago

What happens when you do arch -x86_64 brew install -s python?

/usr/local/opt/python/libexec/bin/python is still missing. /usr/local/Cellar/python@3.9/3.9.1_2/bin/python3.9 is running.

~: arch -x86_64 brew install -s python
Updating Homebrew...
==> Auto-updated Homebrew!
Updated 1 tap (homebrew/core).
==> Updated Formulae
Updated 10 formulae.

==> Downloading https://files.pythonhosted.org/packages/12/e1/b9a2926a3c5a3fb055b8f85052f5baa890106a0e21b64a977c10affea751/setu
######################################################################## 100.0%
==> Downloading https://files.pythonhosted.org/packages/cb/5f/ae1eb8bda1cde4952bd12e468ab8a254c345a0189402bf1421457577f4f3/pip-
######################################################################## 100.0%
==> Downloading https://files.pythonhosted.org/packages/d4/cf/732e05dce1e37b63d54d1836160b6e24fb36eeff2313e93315ad047c7d90/whee
######################################################################## 100.0%
==> Downloading https://www.python.org/ftp/python/3.9.1/Python-3.9.1.tar.xz
######################################################################## 100.0%
==> ./configure --prefix=/usr/local/Cellar/python@3.9/3.9.1_2 --enable-ipv6 --datarootdir=/usr/local/Cellar/python@3.9/3.9.1_2/
==> make
==> make install PYTHONAPPSDIR=/usr/local/Cellar/python@3.9/3.9.1_2
==> make frameworkinstallextras PYTHONAPPSDIR=/usr/local/Cellar/python@3.9/3.9.1_2/share/python@3.9
Error: The `brew link` step did not complete successfully
The formula built, but is not symlinked into /usr/local
Could not symlink bin/easy_install-3.9
Target /usr/local/bin/easy_install-3.9
already exists. You may want to remove it:
  rm '/usr/local/bin/easy_install-3.9'

To force the link and overwrite all conflicting files:
  brew link --overwrite python@3.9

To list all files that would be deleted:
  brew link --overwrite --dry-run python@3.9

Possible conflicting files are:
/usr/local/bin/easy_install-3.9
/usr/local/bin/pip3
/usr/local/bin/pip3.9
==> /usr/local/Cellar/python@3.9/3.9.1_2/bin/python3 -m ensurepip
==> /usr/local/Cellar/python@3.9/3.9.1_2/bin/pip3 install -v --global-option=--no-user-cfg --install-option=--force --install-o
==> Caveats
Python has been installed as
  /usr/local/bin/python3

Unversioned symlinks `python`, `python-config`, `pip` etc. pointing to
`python3`, `python3-config`, `pip3` etc., respectively, have been installed into
  /usr/local/opt/python@3.9/libexec/bin

You can install Python packages with
  pip3 install <package>
They will install into the site-package directory
  /usr/local/lib/python3.9/site-packages

See: https://docs.brew.sh/Homebrew-and-Python
==> Summary
🍺  /usr/local/Cellar/python@3.9/3.9.1_2: 8,628 files, 127.5MB, built in 5 minutes 28 seconds
~: /usr/local/opt/python/libexec/bin/python
zsh: no such file or directory: /usr/local/opt/python/libexec/bin/python
~: /usr/local/Cellar/python@3.9/3.9.1_2/bin/python3.9
Python 3.9.1 (default, Dec 25 2020, 18:02:20) 
[Clang 12.0.0 (clang-1200.0.32.28)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> 
~: file /usr/local/Cellar/python@3.9/3.9.1_2/bin/python3.9
/usr/local/Cellar/python@3.9/3.9.1_2/bin/python3.9: Mach-O 64-bit executable x86_64
adamwinn commented 3 years ago

I'm seeing the same thing on an m1 mac mini using Rosetta2

awinn@Adams-Mac-mini ~/Downloads> /usr/local/bin/python3 --version
fish: '/usr/local/bin/python3 --version' terminated by signal SIGILL (Illegal instruction)
macOS crash report ``` Process: python3.9 [43149] Path: /usr/local/Cellar/python@3.9/3.9.1_2/Frameworks/Python.framework/Versions/3.9/bin/python3.9 Identifier: python3.9 Version: ??? Code Type: X86-64 (Translated) Parent Process: fish [42926] Responsible: iTerm2 [1498] User ID: 501 Date/Time: 2020-12-26 03:32:12.884 -0700 OS Version: macOS 11.1 (20C69) Report Version: 12 Anonymous UUID: AC78B336-C8E2-B28B-513C-995032A92486 Time Awake Since Boot: 130000 seconds System Integrity Protection: disabled Crashed Thread: 0 Dispatch queue: com.apple.main-thread Exception Type: EXC_BAD_INSTRUCTION (SIGILL) Exception Codes: 0x0000000000000001, 0x0000000000000000 Exception Note: EXC_CORPSE_NOTIFY Termination Signal: Illegal instruction: 4 Termination Reason: Namespace SIGNAL, Code 0x4 Terminating Process: exc handler [43149] Thread 0 Crashed:: Dispatch queue: com.apple.main-thread 0 python3 0x0000000100647b4b 0x100644000 + 15179 1 libdyld.dylib 0x00007fff20344621 start + 1 Thread 1:: com.apple.rosetta.exceptionserver 0 runtime 0x00007ffdffc90a98 0x7ffdffc8e000 + 10904 1 runtime 0x00007ffdffc9a2b8 0x7ffdffc8e000 + 49848 2 runtime 0x00007ffdffc9b8e8 0x7ffdffc8e000 + 55528 Thread 0 crashed with X86 Thread State (64-bit): rax: 0x00007ffdf7406bde rbx: 0x000000000000004d rcx: 0x00007ffdf7406bde rdx: 0x0000000000000004 rdi: 0x00007ffdf7406b90 rsi: 0x0000000200a04730 rbp: 0x00000003050b38f0 rsp: 0x00000003050b38a0 r8: 0x00000000000002b2 r9: 0x00000000000002ba r10: 0x00000003050b3a90 r11: 0x00007fff2036b7a0 r12: 0x0000000200a04730 r13: 0x0000000000000000 r14: 0x00000003050b3910 r15: 0x00007ffdf7406b90 rip: 0x0000000100647b4b rfl: 0x0000000000000202 Binary Images: 0x100644000 - 0x100647fff +python3 (0) <3912CEDE-7114-392E-A3E6-47B3DFFCAF53> /usr/local/bin/python3 0x108901000 - 0x108b00fff +org.python.python (3.9.1, [c] 2001-2019 Python Software Foundation. - 3.9.1) <4CD4D4F2-4C6D-3642-B082-9C3AE6EA8165> /usr/local/Cellar/python@3.9/3.9.1_2/Frameworks/Python.framework/Versions/3.9/Python 0x200958000 - 0x2009f3fff dyld (832.7.1) /usr/lib/dyld 0x7ffdffc8e000 - 0x7ffdffd01fff +runtime (203.13.2) <3B9E4ADB-AB4E-30AD-A642-B74313FB48A8> /Library/Apple/*/runtime 0x7fff2005e000 - 0x7fff2005ffff libsystem_blocks.dylib (78) <9CF131C6-16FB-3DD0-B046-9E0B6AB99935> /usr/lib/system/libsystem_blocks.dylib 0x7fff20060000 - 0x7fff20095fff libxpc.dylib (2038.40.38) <003A027D-9CE3-3794-A319-88495844662D> /usr/lib/system/libxpc.dylib 0x7fff20096000 - 0x7fff200adfff libsystem_trace.dylib (1277.50.1) <48C14376-626E-3C81-B0F5-7416E64580C7> /usr/lib/system/libsystem_trace.dylib 0x7fff200ae000 - 0x7fff2014cfff libcorecrypto.dylib (1000.60.19) <92F0211E-506E-3760-A3C2-808BF3905C07> /usr/lib/system/libcorecrypto.dylib 0x7fff2014d000 - 0x7fff20179fff libsystem_malloc.dylib (317.40.8) <2EF43B96-90FB-3C50-B73E-035238504E33> /usr/lib/system/libsystem_malloc.dylib 0x7fff2017a000 - 0x7fff201befff libdispatch.dylib (1271.40.12) /usr/lib/system/libdispatch.dylib 0x7fff201bf000 - 0x7fff201f8fff libobjc.A.dylib (818.2) <339EDCD0-5ABF-362A-B9E5-8B9236C8D36B> /usr/lib/libobjc.A.dylib 0x7fff201f9000 - 0x7fff201fbfff libsystem_featureflags.dylib (28.60.1) <7B4EBDDB-244E-3F78-8895-566FE22288F3> /usr/lib/system/libsystem_featureflags.dylib 0x7fff201fc000 - 0x7fff20284fff libsystem_c.dylib (1439.40.11) <06D9F593-C815-385D-957F-2B5BCC223A8A> /usr/lib/system/libsystem_c.dylib 0x7fff20285000 - 0x7fff202dafff libc++.1.dylib (904.4) /usr/lib/libc++.1.dylib 0x7fff202db000 - 0x7fff202f3fff libc++abi.dylib (904.4) /usr/lib/libc++abi.dylib 0x7fff202f4000 - 0x7fff20322fff libsystem_kernel.dylib (7195.60.75) <4BD61365-29AF-3234-8002-D989D295FDBB> /usr/lib/system/libsystem_kernel.dylib 0x7fff20323000 - 0x7fff2032efff libsystem_pthread.dylib (454.60.1) <8DD3A0BC-2C92-31E3-BBAB-CE923A4342E4> /usr/lib/system/libsystem_pthread.dylib 0x7fff2032f000 - 0x7fff20369fff libdyld.dylib (832.7.1) <2F8A14F5-7CB8-3EDD-85EA-7FA960BBC04E> /usr/lib/system/libdyld.dylib 0x7fff2036a000 - 0x7fff20373fff libsystem_platform.dylib (254.60.1) <3F7F6461-7B5C-3197-ACD7-C8A0CFCC6F55> /usr/lib/system/libsystem_platform.dylib 0x7fff20374000 - 0x7fff2039ffff libsystem_info.dylib (542.40.3) <0979757C-5F0D-3F5A-9E0E-EBF234B310AF> /usr/lib/system/libsystem_info.dylib 0x7fff203a0000 - 0x7fff2083bfff com.apple.CoreFoundation (6.9 - 1770.300) <7AADB19E-8EA2-3C9B-8699-F206DB47C6BE> /System/Library/Frameworks/CoreFoundation.framework/Versions/A/CoreFoundation 0x7fff2245f000 - 0x7fff226c0fff libicucore.A.dylib (66109) <6C0A0196-2778-3035-81CE-7CA48D6C0628> /usr/lib/libicucore.A.dylib 0x7fff226c1000 - 0x7fff226cafff libsystem_darwin.dylib (1439.40.11) /usr/lib/system/libsystem_darwin.dylib 0x7fff22adb000 - 0x7fff22ae6fff libsystem_notify.dylib (279.40.4) <98D74EEF-60D9-3665-B877-7BE1558BA83E> /usr/lib/system/libsystem_notify.dylib 0x7fff24a36000 - 0x7fff24a44fff libsystem_networkextension.dylib (1295.60.5) /usr/lib/system/libsystem_networkextension.dylib 0x7fff24aa2000 - 0x7fff24ab8fff libsystem_asl.dylib (385) <940C5BB9-4928-3A63-97F2-132797C8B7E5> /usr/lib/system/libsystem_asl.dylib 0x7fff261cf000 - 0x7fff261d6fff libsystem_symptoms.dylib (1431.60.1) <88F35AAC-746F-3176-81DF-49CE3D285636> /usr/lib/system/libsystem_symptoms.dylib 0x7fff28503000 - 0x7fff28513fff libsystem_containermanager.dylib (318.60.1) <4ED09A19-04CC-3464-9EFB-F674932020B5> /usr/lib/system/libsystem_containermanager.dylib 0x7fff29213000 - 0x7fff29216fff libsystem_configuration.dylib (1109.60.2) /usr/lib/system/libsystem_configuration.dylib 0x7fff29217000 - 0x7fff2921bfff libsystem_sandbox.dylib (1441.60.4) <8CE27199-D633-31D2-AB08-56380A1DA9FB> /usr/lib/system/libsystem_sandbox.dylib 0x7fff29e26000 - 0x7fff29e28fff libquarantine.dylib (119.40.2) <19D42B9D-3336-3543-AF75-6E605EA31599> /usr/lib/system/libquarantine.dylib 0x7fff2a3a8000 - 0x7fff2a3acfff libsystem_coreservices.dylib (127) /usr/lib/system/libsystem_coreservices.dylib 0x7fff2a5c3000 - 0x7fff2a60efff libsystem_m.dylib (3186.40.2) <0F98499E-662F-36EC-AB58-91A8D5A0FB74> /usr/lib/system/libsystem_m.dylib 0x7fff2a610000 - 0x7fff2a615fff libmacho.dylib (973.4) <28AE1649-22ED-3C4D-A232-29D37F821C39> /usr/lib/system/libmacho.dylib 0x7fff2a632000 - 0x7fff2a63dfff libcommonCrypto.dylib (60178.40.2) <1D0A75A5-DEC5-39C6-AB3D-E789B8866712> /usr/lib/system/libcommonCrypto.dylib 0x7fff2a63e000 - 0x7fff2a648fff libunwind.dylib (200.10) /usr/lib/system/libunwind.dylib 0x7fff2a649000 - 0x7fff2a650fff liboah.dylib (203.13.2) /usr/lib/liboah.dylib 0x7fff2a651000 - 0x7fff2a65bfff libcopyfile.dylib (173.40.2) <89483CD4-DA46-3AF2-AE78-FC37CED05ACC> /usr/lib/system/libcopyfile.dylib 0x7fff2a65c000 - 0x7fff2a663fff libcompiler_rt.dylib (102.2) <0DB26EC8-B4CD-3268-B865-C2FC07E4D2AA> /usr/lib/system/libcompiler_rt.dylib 0x7fff2a664000 - 0x7fff2a666fff libsystem_collections.dylib (1439.40.11) /usr/lib/system/libsystem_collections.dylib 0x7fff2a667000 - 0x7fff2a669fff libsystem_secinit.dylib (87.60.1) <99B5FD99-1A8B-37C1-BD70-04990FA33B1C> /usr/lib/system/libsystem_secinit.dylib 0x7fff2a66a000 - 0x7fff2a66cfff libremovefile.dylib (49.40.3) <750012C2-7097-33C3-B796-2766E6CDE8C1> /usr/lib/system/libremovefile.dylib 0x7fff2a66d000 - 0x7fff2a66dfff libkeymgr.dylib (31) <2C7B58B0-BE54-3A50-B399-AA49C19083A9> /usr/lib/system/libkeymgr.dylib 0x7fff2a66e000 - 0x7fff2a675fff libsystem_dnssd.dylib (1310.60.4) <81EFC44D-450E-3AA3-AC8F-D7EF68F464B4> /usr/lib/system/libsystem_dnssd.dylib 0x7fff2a676000 - 0x7fff2a67bfff libcache.dylib (83) <2F7F7303-DB23-359E-85CD-8B2F93223E2A> /usr/lib/system/libcache.dylib 0x7fff2a67c000 - 0x7fff2a67dfff libSystem.B.dylib (1292.60.1) /usr/lib/libSystem.B.dylib 0x7fff2a67e000 - 0x7fff2a681fff libfakelink.dylib (3) <34B6DC95-E19A-37C0-B9D0-558F692D85F5> /usr/lib/libfakelink.dylib 0x7fff2a682000 - 0x7fff2a682fff com.apple.SoftLinking (1.0 - 1) <90D679B3-DFFD-3604-B89F-1BCF70B3EBA4> /System/Library/PrivateFrameworks/SoftLinking.framework/Versions/A/SoftLinking 0x7fff2dc0b000 - 0x7fff2dc0bfff liblaunch.dylib (2038.40.38) <05A7EFDD-4111-3E4D-B668-239B69DE3D0F> /usr/lib/system/liblaunch.dylib 0x7fff300b8000 - 0x7fff300b8fff libsystem_product_info_filter.dylib (8.40.1) <7CCAF1A8-F570-341E-B275-0C80B092F8E0> /usr/lib/system/libsystem_product_info_filter.dylib Translated Code Information: tmp0: 0x0000000100647b4b tmp1: 0x000003c40510f8c5 tmp2: 0x00f8c5104111f8c5 External Modification Summary: Calls made by other processes targeting this process: task_for_pid: 0 thread_create: 0 thread_set_state: 0 Calls made by this process: task_for_pid: 0 thread_create: 0 thread_set_state: 0 Calls made by all processes on this machine: task_for_pid: 110222 thread_create: 0 thread_set_state: 0 VM Region Summary: ReadOnly portion of Libraries: Total=521.1M resident=0K(0%) swapped_out_or_unallocated=521.1M(100%) Writable regions: Total=159.4M written=0K(0%) resident=0K(0%) swapped_out=0K(0%) unallocated=159.4M(100%) VIRTUAL REGION REGION TYPE SIZE COUNT (non-coalesced) =========== ======= ======= Kernel Alloc Once 8K 1 MALLOC 10.1M 9 MALLOC guard page 96K 4 Rosetta Arena 2048K 1 Rosetta Generic 588K 144 Rosetta IndirectBranch 32K 1 Rosetta JIT 128.0M 1 Rosetta Return Stack 20K 2 Rosetta Thread Context 20K 2 Stack 8176K 1 Stack Guard 56.0M 1 VM_ALLOCATE 7156K 2 VM_ALLOCATE (reserved) 12K 1 reserved VM address space (unallocated) __DATA 851K 51 __DATA_CONST 3034K 40 __DATA_DIRTY 95K 23 __LINKEDIT 506.8M 7 __OBJC_RO 60.5M 1 __OBJC_RW 2451K 2 __TEXT 14.4M 50 __UNICODE 588K 1 mapped file 4.1G 84 shared memory 16K 1 unshared pmap 3440K 2 =========== ======= ======= TOTAL 4.9G 432 TOTAL, minus reserved VM space 4.9G 432 ```
carlocab commented 3 years ago

It's not clear this is a packaging issue. I suspect it should be possible to reproduce the illegal hardware instruction error by running in Rosetta 2 a non-Homebrew Python3 executable compiled for Intel.

@lutzroeder can you still reproduce the illegal hardware instruction error after doing arch -x86_64 brew install -s python? The fact that /usr/local/opt/python/libexec/bin/python is missing is unrelated to the original error you report; you just need to uninstall python first before doing arch -x86_64 brew install. (You may need to pass --ignore-dependencies when you uninstall.)

I think we may just need to skip pouring Intel Big Sur bottles when brew is running in Rosetta. @fxcoudert

Clarification: I mean the above for only the python@3.9 formula, btw. Apologies for any misunderstanding caused.

kubicek commented 3 years ago

I've just used arch -x86_64 brew install -s python and python works again on my M1.

Adham2023 commented 3 years ago

Working solution for me is: brew uninstall python@3.9 and downloaded .pkg from python.org just installed

carlocab commented 3 years ago

@Adham2023 which .pkg did you download? Does the error not emerge with their Intel-only pkg?

Adham2023 commented 3 years ago

@Adham2023 which .pkg did you download? Does the error not emerge with their Intel-only pkg? there was no error after installation. at all

carlocab commented 3 years ago

@Adham2023 That doesn't answer my question at all, unfortunately. Are you talking about their Intel-only package, or their universal binary?

mologie commented 3 years ago

@carlocab

I think we may just need to skip pouring Intel Big Sur bottles when brew is running in Rosetta. @fxcoudert

I don't think this is the right conclusion. A quick analysis of the problem is available in the 3rd comment of this thread. The issue is that during building the AVX CPU feature was available and thus probably included into the bottle. However, not all target systems have AVX. Granted, all macs running Big Sur these days would have AVX support, but older Macs do not and might be similarly affected if the CI CPU is recent enough.

A solution that wouldn't entirely exclude M1 machines from bottles is to fix the configure call to python so that it does not depend on AVX instructions (or uses them only after feature checking). I am unfamiliar with how Python is built unfortunately, so this is just a general direction I'd go with.

Maybe it's even possible to generalize this in brew's build/CI environment. If all bottles build against a known-good CPU baseline compatible with Rosetta, then these compatibility problems are avoided completely for all bottles.

SMillerDev commented 3 years ago

If all bottles build against a known-good CPU baseline compatible with Rosetta

All bottles are compiled against a CPU baseline for the minimal supported CPU on Mojave (the oldest os we support). I don't think poring resources into the stopgap that is Rosetta is worth the effort, but you're always welcome to make a pull request if you disagree.

carlocab commented 3 years ago

@mologie Yes, I did read the 3rd comment in this thread, and that is what lead me to the conclusions I posted up this thread. It may not be the ideal solution, but it's the solution we have the time and resources to implement.

A solution that wouldn't entirely exclude M1 machines

I'm afraid you may have misunderstood my comment. I don't intend on excluding all M1 machines from all bottles. I just want to stop the python@3.9 formula from pouring an Intel Big Sur bottle when brew is being run in Rosetta, because, as we've seen, that installs something that's broken.

Note that M1 machines can still pour ARM bottles whenever one is available, as long as brew is not run in Rosetta. An ARM bottle is available for python@3.9.

Bear in mind as well that I didn't say this should be a permanent feature of the formula going forward. Of course, it may well become permanent because alternative fixes appear to require substantially more effort and resources than they're worth. But, in the meantime, the least we could do is not install broken software on users' machines.

mologie commented 3 years ago

Thanks for the clarification and sorry for raising a stink ;-). If /some/ software needs to be built from source after it is found to be broken under Rosetta that's probably a good compromise until we arrive in arm-land. Cool to hear that progress is being made on arm bottles, though in my case I unfortunately have to pull in python as dependency a whole bunch of packages.

I will look into whether Rosetta compatibility can be restored for the Python bottle. Its formula however is rather long already and like you mentioned it may not be worth the effort.

carlocab commented 3 years ago

If /some/ software needs to be built from source after it is found to be broken under Rosetta that's probably a good compromise until we arrive in arm-land.

This is exactly what I had in mind when I wrote down that comment. Apologies for the misunderstanding!

Fixing the python formula to also work in Rosetta might also be the wrong place to devote attention to. It may be better to just investigate which (Intel) Big Sur bottles use AVX instructions, as those are likely to be broken when used under Rosetta.

mitchblank commented 3 years ago

All bottles are compiled against a CPU baseline for the minimal supported CPU on Mojave (the oldest os we support).

Are you 100% sure of that? Per https://arstechnica.com/gadgets/2020/11/macos-11-0-big-sur-the-ars-technica-review/10/

For apps that already know to check for AVX support, since those instruction sets aren’t available on all Macs that run Big Sur, the apps should see that the instructions aren’t supported and run without issue.

If that's true that implies that even Big Sur can run on Intel macs without AVX, but I don't know which ones they're talking about. The oldest Mojave machines would be a 2010 Mac Pro (assuming it has a GPU upgrade) which wouldn't have AVX, though.

So probably there aren't many non-AVX machines running homebrew, but I'm not sure there aren't any. If programs are assuming that they can use it just because the build machine had it (maybe -march=native snuck in somewhere?) then some small slice of intel macs might be affected as well.

karliss commented 3 years ago

Vim also affected. That and meson(Python) were working fine at the beginning of week so it seems like recent update caused a lot more software to be compiled with AVX support enabled.

fxcoudert commented 3 years ago

For python, the information we need right now is: where is that AVX instruction generated? We need a full backtrace of the crash before we discuss further what is needed or necessary. (And the same for vim)

karliss commented 3 years ago
vim backtrace and dissasembly

#### ``` * thread #1, queue = 'com.apple.main-thread', stop reason = EXC_BAD_INSTRUCTION (code=EXC_I386_INVOP, subcode=0x0) * frame #0: 0x00000001001e78f1 vim`main + 45 frame #1: 0x00007fff203d0621 libdyld.dylib`start + 1 ``` ```asm 0x1001e78ec <+40>: callq 0x1000e9d67 ; mch_early_init -> 0x1001e78f1 <+45>: vxorps %xmm0, %xmm0, %xmm0 0x1001e78f5 <+49>: vmovups %xmm0, 0x72e0b(%rip) ; params 0x1001e78fd <+57>: vmovups %xmm0, 0x72ef3(%rip) ; params + 240 0x1001e7905 <+65>: vmovups %xmm0, 0x72f0b(%rip) ; params + 272 0x1001e790d <+73>: vmovups %xmm0, 0x72e03(%rip) ; params + 16 0x1001e7915 <+81>: vmovups %xmm0, 0x72e1b(%rip) ; params + 48 0x1001e791d <+89>: vmovups %xmm0, 0x72e03(%rip) ; params + 32 0x1001e7925 <+97>: vmovups %xmm0, 0x72e2b(%rip) ; params + 80 0x1001e792d <+105>: vmovups %xmm0, 0x72e13(%rip) ; params + 64 0x1001e7935 <+113>: vmovups %xmm0, 0x72e3b(%rip) ; params + 112 0x1001e793d <+121>: vmovups %xmm0, 0x72e23(%rip) ; params + 96 0x1001e7945 <+129>: vmovups %xmm0, 0x72e4b(%rip) ; params + 144 0x1001e794d <+137>: vmovups %xmm0, 0x72e33(%rip) ; params + 128 0x1001e7955 <+145>: vmovups %xmm0, 0x72e5b(%rip) ; params + 176 0x1001e795d <+153>: vmovups %xmm0, 0x72e43(%rip) ; params + 160 0x1001e7965 <+161>: vmovups %xmm0, 0x72e6b(%rip) ; params + 208 0x1001e796d <+169>: vmovups %xmm0, 0x72e53(%rip) ; params + 192 0x1001e7975 <+177>: vmovups %xmm0, 0x72e6b(%rip) ; params + 224 0x1001e797d <+185>: vmovups %xmm0, 0x72e83(%rip) ; params + 256 0x1001e7985 <+193>: movq $0x0, 0x72e98(%rip) ; params + 284 0x1001e7990 <+204>: movl %r15d, 0x72d71(%rip) ; params 0x1001e7997 <+211>: movq %r12, 0x72d72(%rip) ; params + 8 0x1001e799e <+218>: movl $0x1, 0x72e58(%rip) ; params + 244 0x1001e79a8 <+228>: movq $-0x1, 0x72e6d(%rip) ; params + 276 0x1001e79b3 <+239>: leaq -0x440(%rbp), %rdi 0x1001e79ba <+246>: callq 0x1001d0180 ; vim_ruby_init ```

So the lines between mch_early_init and vim_ruby_init https://github.com/vim/vim/blob/cdc40c43f1008bda2f173d3a13606236679e8067/src/main.c#L106 . Probably the CLEAR_FIELD(params) which is just a define for memset(0) that got inlined and unrolled.

karliss commented 3 years ago

For Python similar situation to VIM just a boring strcpy which compiler inlined and implemented using AVX https://github.com/python/cpython/blob/ed48e9e2862971c2e9dcbd9a253477ec3def5e2e/Mac/Tools/pythonw.c#L85

python backtrace and dissasembly

#### ``` * thread #1, queue = 'com.apple.main-thread', stop reason = EXC_BAD_INSTRUCTION (code=EXC_I386_INVOP, subcode=0x0) * frame #0: 0x0000000100003b4b python3.9`___lldb_unnamed_symbol1$$python3.9 + 126 frame #1: 0x00007fff203d0621 libdyld.dylib`start + 1 frame #2: 0x00007fff203d0621 libdyld.dylib`start + 1 ``` ```asm 0x100003adf <+18>: movq 0x51a(%rip), %rdi ; (void *)0x00000001083da53e: Py_Initialize 0x100003ae6 <+25>: leaq -0x48(%rbp), %rsi 0x100003aea <+29>: callq 0x100003dc4 ; symbol stub for: dladdr 0x100003aef <+34>: testl %eax, %eax 0x100003af1 <+36>: je 0x100003b10 ; <+67> 0x100003af3 <+38>: movq -0x48(%rbp), %r12 0x100003af7 <+42>: movq %r12, %rdi 0x100003afa <+45>: callq 0x100003e0c ; symbol stub for: strlen 0x100003aff <+50>: movq %rax, %rbx 0x100003b02 <+53>: leaq 0x3c(%rax), %rdi 0x100003b06 <+57>: callq 0x100003ddc ; symbol stub for: malloc 0x100003b0b <+62>: testq %rax, %rax 0x100003b0e <+65>: jne 0x100003b15 ; <+72> 0x100003b10 <+67>: xorl %r15d, %r15d 0x100003b13 <+70>: jmp 0x100003b79 ; <+172> 0x100003b15 <+72>: movq %rax, %r15 0x100003b18 <+75>: movq %rax, %rdi 0x100003b1b <+78>: movq %r12, %rsi 0x100003b1e <+81>: callq 0x100003e06 ; symbol stub for: strcpy 0x100003b23 <+86>: leaq 0x1(%r15), %rax 0x100003b27 <+90>: cmpq $0x1, %rbx 0x100003b2b <+94>: je 0x100003b40 ; <+115> 0x100003b2d <+96>: cmpb $0x2f, -0x1(%r15,%rbx) 0x100003b33 <+102>: leaq -0x1(%rbx), %rbx 0x100003b37 <+106>: jne 0x100003b27 ; <+90> 0x100003b39 <+108>: leaq (%r15,%rbx), %rax 0x100003b3d <+112>: incq %rax 0x100003b40 <+115>: leaq 0x1(%rax), %rcx 0x100003b44 <+119>: cmpb $0x2e, (%rax) 0x100003b47 <+122>: cmovneq %rax, %rcx -> 0x100003b4b <+126>: vmovups 0x3c4(%rip), %xmm0 ; ".app/Contents/MacOS/Python" 0x100003b53 <+134>: vmovups %xmm0, 0x10(%rcx) 0x100003b58 <+139>: vmovups 0x3a7(%rip), %xmm0 ; "Resources/Python.app/Contents/MacOS/Python" 0x100003b60 <+147>: vmovups %xmm0, (%rcx) 0x100003b64 <+151>: movabsq $0x687479502f534f63, %rax ; imm = 0x687479502F534F63 0x100003b6e <+161>: movq %rax, 0x20(%rcx) 0x100003b72 <+165>: movl $0x6e6f68, 0x27(%rcx) ; imm = 0x6E6F68 0x100003b79 <+172>: leaq -0x24(%rbp), %rsi 0x100003b7d <+176>: movl $0x800, (%rsi) ; imm = 0x800 0x100003b83 <+182>: leaq 0x4506(%rip), %rdi 0x100003b8a <+189>: callq 0x100003db2 ; symbol stub for: _NSGetExecutablePath ```

fxcoudert commented 3 years ago

We compile all Big Sur bottles with AVX support (i.e. the compiler is free to use AVX instructions if it thinks it will be faster). The reasoning is explained in https://github.com/Homebrew/brew/pull/10092. Basically, all Apple-supported hardware for Big Sur supports AVX instructions (the oldest supported Big Sur Intel hardware is Ivy Bridge EP).

It seems like a poor choice for Apple not to support in Rosetta 2 the exact set of instructions that they require in their supported hardware. But alas, we'll probably have to live with it.

mitchblank commented 3 years ago

It seems like a poor choice for Apple not to support in Rosetta 2 the exact set of instructions that they require in their supported hardware

I think there are patent licensing issues with emulating AVX unfortunately :-(

fxcoudert commented 3 years ago

Our options, as far as I can see:

  1. Build Big Sur Intel bottles without AVX: would penalise current Intel users. Also, how many bottles do we have to rebuild? 😭
  2. Install Catalina bottles on Rosetta: likely to break a couple of formulas that depend on a specific OS or SDK version (gcc). Although, given that they have the same Xcode and SDK version, that would actually not be so bad.
  3. I actually can't think of a third option. Maybe “blame the problem on someone else” 🍏

gonna ping @jonchang and @MikeMcQuaid on this

fxcoudert commented 3 years ago

While we think about this, I'll be reverting that change in brew: https://github.com/Homebrew/brew/pull/10153

mitchblank commented 3 years ago

I think the only option that is safe is to revert https://github.com/Homebrew/brew/pull/10092 so you don't create Big Sur bottles that can't run under Rosetta.

Maybe in some future where most Apple Silicon users are getting :arm64_big_sur bottles you could start bottling some intel things with AVX (:really_intel_big_sur?) but even that runs the risk of problems if someone upgrades their Intel mac to a Apple Silicon one via a Time Machine backup or something.

I think you just have to treat "M2+Rosetta" as another Intel CPU variant that is "supported" by Big Sur which means the original statement that all Big Sur Intel macs have AVX is a false one.

fxcoudert commented 3 years ago

How do we audit the bottles produced in the mean time to figure out which ones are using AVX instructions?

carlocab commented 3 years ago

Two other options, though I don't think you like either of them:

  1. Don't pour bottles for some formulae on systems running brew under Rosetta. (How else are you going to notice that you've got a fast new processor if you're not compiling software?)
  2. Build universal binaries for formulae that support it (sorry)

...actually, now that I think about it, that second option probably won't cut it. Never mind; not sure what I was thinking.

mitchblank commented 3 years ago

How do we audit the bottles produced in the mean time to figure out which ones are using AVX instructions?

Probably easier to just redo the last 4 days worth of Big Sur bottles. At least with git it should be easy to see which ones got a new bottle in that time.

carlocab commented 3 years ago

Probably easier to just redo the last 4 days worth of Big Sur bottles.

That's still a lot of work and I'm not convinced all of it will be necessary. Even software compiled to use AVX instructions should, in theory, check that the processor supports it before using them. (Not sure why Python doesn't seem to be doing that here.) Of course, if the facility for checking AVX support is somehow broken or non-standard on Rosetta, then all bets are off.

mitchblank commented 3 years ago

Even software compiled to use AVX instructions should, in theory, check that the processor supports it before using them.

The problem is that due to homebrew/brew#10092 homebrew has not been doing that for the last 4 days. AVX support is hardwired on.

And basically any program that uses, say, memset() has probably got AVX pollution in its bottle as a result.

carlocab commented 3 years ago

I'm confused. My understanding is that https://github.com/Homebrew/brew/pull/10092 enables AVX support at compile-time, so that software is compiled with the ability to use AVX instructions. My earlier comment was about checking for AVX support at run-time.

Surely, that has to be done, or else anyone else distributing Intel binaries either:

  1. distributes different binaries for different Intel processors depending on whether they have AVX support or not; or,
  2. distributes binaries with no AVX support enabled.

The former appears to be extremely rare, while the latter is possible but difficult to verify. (And also would be surprising.)

mitchblank commented 3 years ago

My understanding is that Homebrew/brew#10092 enables AVX support at compile-time, so that software is compiled with the ability to use AVX instructions.

No, it is setting compiler flags that promise the binary will only be run on AVX-capable machines, and therefore the compiler can use AVX any time if feels like it. Hence why things like small memset() operations are getting inlined as AVX instructions.

carlocab commented 3 years ago

Ah, I see. Thanks for the explanation.

dnjo commented 3 years ago

I could get around this issue with Python by reinstalling and building from source: brew reinstall -s python@3.9.

Of course it's not a good long term solution but I needed this fixed quickly since Neovim was complaining every time I opened a file. 🙂

fleitz commented 3 years ago

How do we audit the bottles produced in the mean time to figure out which ones are using AVX instructions?

otool -d /path/to/binary | grep -e 'VBROADCASTSS|<AVX instruction 2>|<AVX instruction 3...>|'

https://en.wikipedia.org/wiki/X86_instruction_listings#AVX

mitchblank commented 3 years ago

otool -d

I assume you mean otool -tv?

However that does risk false positives since it's possible for programs to use AVX optimizations conditionally and still be correct. Therefore it probably isn't something that would make a good automated audit.

My advice remains that every Big Sur bottle that was created between Homebrew/brew#10092 and Homebrew/brew#10153 landing should be presumed guilty and rebuilt. Since even simple things like memset()/memcpy()/strlen() or indeed just structure assignment can invite the compiler to use AVX I doubt that there are any C/C++ bottles that aren't damaged. And this damage might not be as obvious as it has been for python and vim where they fail immediately on startup -- it could crash only in certain seemingly-random places.

fleitz commented 3 years ago

Probably those are the right flags, I'm used to objdump.

The binaries run fine on Big Sur supported hardware, and that which emulates supported hardware. Rosetta does not emulate supported hardware. I'd rather have the performance improvements from AVX.

The binaries are not damaged, Rosetta 2 simply emulates hardware unsupported by the intel version of Big Sur

mitchblank commented 3 years ago

Rosetta does not emulate supported hardware.

That's the wrong way to look at the issue. "M2+Rosetta" is one of the supported environments for running Intel mac binaries, just a much as a phyiscal intel CPU is.

And, in Apple's defence, as far as I'm aware they've never recommended non-conditional AVX use. If you use Xcode with the normal flags, it remains off even when targeting MacOS 11. The entire issue here is that homebrew got more aggressive than the recommendations and have been burned by it. That's the whole bug.

The reality is that Apple Silicon is already in the hands of millions of people right now, so this isn't an obscure platform that can just be ignored. Indeed, a large percentage of homebrew users will probably make the transition over the next couple years. If their first experience after restoring their Time Machine backup on their shiny new macbook is that nothing in /usr/local/bin works, homebrew will be getting a bug filed every hour about it.

The binaries in the bottles just have to be Rosetta-compatible even if being installed onto a GenuineIntel machine; anything else will turn into a mess.

fleitz commented 3 years ago

I don't think it is, I would assume if Rosetta 2 was essential to the functioning of Big Sur it would be included, which it isn't. I liken it to installing QEMU for some ancient CPU and software not working. The list of supported Big Sur CPUs all have AVX.

If you restore a time machine backup to a non-intel mac it will break lots of things.

mitchblank commented 3 years ago

I would assume if Rosetta 2 was essential to the functioning of Big Sur it would be included, which it isn't.

From my understanding, the first time you try to run an intel binary you're led through its installation... so it's effectively part of the operating system. Sure, Apple sells to plenty of users that just use Safari.app and Music.app and who will be Intel-free as soon as they're on their new M2 mac. That doesn't mean that Rosetta isn't an Apple-supported runtime environment, and one that many homebrew users will come face-to-face with.

If you restore a time machine backup to a non-intel mac it will break lots of things.

Can you elaborate? I haven't tried it myself, but everything I've read suggests that using Migration Assistant is supposed to make going from Intel to ARM machines seamless (at least in that direction) and that Rosetta2 is a big part of that. Are you saying that Apple doesn't support Migration Assistant to M2?

My understanding is that it's not only supported, but very common.

I liken it to installing QEMU

It's not. Apple is not advertising it as an exact emulation of a particular CPU, but as something that can run OS/X-Intel binaries. At no time has Apple said that you can assume AVX's availability -- you always have to runtime test for it. The bug is that homebrew distributed some binaries that break this rule.

carlocab commented 3 years ago

it's possible for programs to use AVX optimizations conditionally

Yes, but, as we've seen, it seems like the compiler will optimise the compatibility check away the moment you give it the chance to. Using otool might be a decent starting point for an audit procedure of formulae that were bottled in the last week or so.

mologie commented 3 years ago

Yes, but, as we've seen, it seems like the compiler will optimise the compatibility check away the moment you give it the chance to. Using otool might be a decent starting point for an audit procedure of formulae that were bottled in the last week or so.

There has been no evidence of this as far as I can tell reading through this issue. The compiler simply inserts AVX instructions for trivial things such as memset(). The only reasonable recurse is to assume that /every/ package built after the compiler flag change is broken and must be rebuilt, like suggested by @mitchblank.

If this is too hard on the CI infra then it may be an option to just rebuild those formulas that have bottle which contain AVX instructions (even those that use it conditionally -- rebuilding them won't hurt). However, I have a gut feeling that says that a majority of C/C++ ones are affected either way.

Is the effort of auditing worth it vs. just rebuilding everything from the past week?

carlocab commented 3 years ago

just rebuild those formulas that have bottle which contain AVX instructions

This is exactly what I was suggesting.

Anyway, I tried to have a look at how many bottles might have been affected by this.

My naive attempt was to look at homebrew/core's log between 22 December and 26 December and identify all the commits that contain the word "bottle" in them. I then extracted the associated formula names and removed duplicates. This produced 2810 potential bottles with AVX instructions inside: https://gist.github.com/carlocab/3366581b9d0aed2bd95688097fb06352

Of course, I suspect that most of these bottles are actually ARM bottle jobs, but one will need to pick a smarter procedure than the one I tried above to find them.

One place to start with such a procedure is to find all the bottle commits that are preceded (but not necessarilly immediately) by non-bottle commits. Otherwise, maybe there is a way to look at the modification date of the sha256 .* => :big_sur line, but I'll leave that idea to someone who knows git better than I do.

mitchblank commented 3 years ago

How about git log -G '=> :big_sur' ...?

carlocab commented 3 years ago

Cool, TIL. That leaves ~244~ 108 bottles:

AVX bottles

``` abseil aliyun-cli apcupsd asio aws-google-auth bartycrouch benthos borgbackup broot bumpversion chkrootkit chuck cocoapods coturn cucumber-cpp cucumber-ruby dfc diceware dmalloc dnscontrol doctest dvc eksctl fastlane flow-cli folderify gcalcli get-flash-videos git-remote-codecommit git-revise gitless glow grpc h2spec hamlib hiredis hqx htmldoc imagejs imagemagick imagemagick@6 jql lemon libarchive libcddb libhandy libjson-rpc-cpp libstfl libstrophe libxslt macosvpn macvim mdbtools micronaut mikutter mkvtomp4 moarvm mpop mps-youtube nexus nfpm ngs nicotine-plus notmuch nqp okteto openvdb orbit orientdb pakchois passenger pdfcpu pdftoipe pg_top plowshare promtail pugixml pyinvoke python@3.9 rakudo raylib restview rinetd robot-framework rst-lint ruby scour shallow-backup sphinx-doc sshuttle statik supervisor swiftformat terraform_landscape tin tokei trailscraper trash-cli travis vim vit vitetris weechat whistle ydcv you-get youtube-dl youtube-dlc ```