Segfault on M1 Macbook - Githubissues

hmaarrfk commented 4 months ago

Solution to issue cannot be found in the documentation.

[X] I checked the documentation.

Issue

mamba create --name pytorch python=3.10 pytorch numpy --channel conda-forge --override-channels
mamba activate pytorch
python -c "import numpy; import torch; torch.zeros((1024, 1024), dtype=torch.uint8)"

Installed packages

% mamba list
# packages in environment at /Users/mark/miniforge3/envs/pytorch:
#
# Name                    Version                   Build  Channel
bzip2                     1.0.8                h93a5062_5    conda-forge
ca-certificates           2024.6.2             hf0a4a13_0    conda-forge
filelock                  3.15.4             pyhd8ed1ab_0    conda-forge
fsspec                    2024.6.0           pyhff2d567_0    conda-forge
gmp                       6.3.0                h7bae524_2    conda-forge
gmpy2                     2.1.5           py310h3bc658a_1    conda-forge
jinja2                    3.1.4              pyhd8ed1ab_0    conda-forge
libabseil                 20240116.2      cxx17_hebf3989_0    conda-forge
libblas                   3.9.0           22_osxarm64_openblas    conda-forge
libcblas                  3.9.0           22_osxarm64_openblas    conda-forge
libcxx                    17.0.6               he7857fb_1    conda-forge
libffi                    3.4.2                h3422bc3_5    conda-forge
libgfortran               5.0.0           13_2_0_hd922786_3    conda-forge
libgfortran5              13.2.0               hf226fd6_3    conda-forge
liblapack                 3.9.0           22_osxarm64_openblas    conda-forge
libopenblas               0.3.27          openmp_h6c19121_0    conda-forge
libprotobuf               4.25.3               hbfab5d5_0    conda-forge
libsqlite                 3.46.0               hfb93653_0    conda-forge
libtorch                  2.3.1           cpu_generic_hf1facdc_0    conda-forge
libuv                     1.48.0               h93a5062_0    conda-forge
libzlib                   1.3.1                hfb2fe0b_1    conda-forge
llvm-openmp               18.1.8               hde57baf_0    conda-forge
markupsafe                2.1.5           py310hd125d64_0    conda-forge
mpc                       1.3.1                h91ba8db_0    conda-forge
mpfr                      4.2.1                h41d338b_1    conda-forge
mpmath                    1.3.0              pyhd8ed1ab_0    conda-forge
ncurses                   6.5                  hb89a1cb_0    conda-forge
networkx                  3.3                pyhd8ed1ab_1    conda-forge
nomkl                     1.0                  h5ca1d4c_0    conda-forge
numpy                     2.0.0           py310h52bbd9b_0    conda-forge
openssl                   3.3.1                hfb2fe0b_0    conda-forge
pip                       24.0               pyhd8ed1ab_0    conda-forge
python                    3.10.14         h2469fbe_0_cpython    conda-forge
python_abi                3.10                    4_cp310    conda-forge
pytorch                   2.3.1           cpu_generic_py310hb190f2a_0    conda-forge
readline                  8.2                  h92ec313_1    conda-forge
setuptools                70.1.0             pyhd8ed1ab_0    conda-forge
sleef                     3.5.1                h156473d_2    conda-forge
sympy                     1.12.1          pypyh2585a3b_103    conda-forge
tk                        8.6.13               h5083fa2_1    conda-forge
typing_extensions         4.12.2             pyha770c72_0    conda-forge
tzdata                    2024a                h0c530f3_0    conda-forge
wheel                     0.43.0             pyhd8ed1ab_1    conda-forge
xz                        5.2.6                h57fd34a_0    conda-forge

Environment info

% conda info

     active environment : pytorch
    active env location : /Users/mark/miniforge3/envs/pytorch
            shell level : 2
       user config file : /Users/mark/.condarc
 populated config files : /Users/mark/miniforge3/.condarc
                          /Users/mark/.condarc
          conda version : 24.5.0
    conda-build version : not installed
         python version : 3.10.14.final.0
                 solver : libmamba (default)
       virtual packages : __archspec=1=m1
                          __conda=24.5.0=0
                          __osx=14.5=0
                          __unix=0=0
       base environment : /Users/mark/miniforge3  (writable)
      conda av data dir : /Users/mark/miniforge3/etc/conda
  conda av metadata url : None
           channel URLs : https://conda.anaconda.org/conda-forge/osx-arm64
                          https://conda.anaconda.org/conda-forge/noarch
          package cache : /Users/mark/miniforge3/pkgs
                          /Users/mark/.conda/pkgs
       envs directories : /Users/mark/miniforge3/envs
                          /Users/mark/.conda/envs
               platform : osx-arm64
             user-agent : conda/24.5.0 requests/2.32.3 CPython/3.10.14 Darwin/23.5.0 OSX/14.5 solver/libmamba conda-libmamba-solver/24.1.0 libmambapy/1.5.8
                UID:GID : 503:20
             netrc file : None
           offline mode : False

hmaarrfk commented 4 months ago

Recreatable with:

Pytorch 2.3.1 + numpy 1.26.4
Pytorch 2.3.0 + numpy 1.26.4

Does not happen with

Pytorch 2.1.2 + numpy 1.26.4

Cannot recreate with:

conda create --name pt python=3.10 
conda activate pt
# numpy 2.0 and pytorch 2.3.1 get instaled
pip install torch numpy
python -c "import numpy; import torch; torch.zeros((1024, 1024), dtype=torch.uint8)"
# no segfault

can recreate with

conda create --name pt python=3.10 numpy 
conda activate pt
pip install torch
python -c "import numpy; import torch; torch.zeros((1024, 1024), dtype=torch.uint8)"
# Segfault...

In all these cases importing numpy second does not recreate the issue

hmaarrfk commented 4 months ago

And now with lldb

% lldb -- python -c "import numpy; import torch; torch.zeros((1024, 1024), dtype=torch.uint8)"
(lldb) target create "python"
Current executable set to '/Users/mark/miniforge3/envs/pt/bin/python' (arm64).
(lldb) settings set -- target.run-args  "-c" "import numpy; import torch; torch.zeros((1024, 1024), dtype=torch.uint8)"
(lldb) run
Process 58336 launched: '/Users/mark/miniforge3/envs/pt/bin/python' (arm64)
Process 58336 stopped
* thread #2, stop reason = EXC_BAD_ACCESS (code=1, address=0x540)
    frame #0: 0x000000010086ef94 libomp.dylib`__kmp_suspend_initialize_thread + 32
libomp.dylib`:
->  0x10086ef94 <+32>: ldr    w8, [x0, #0x540]
    0x10086ef98 <+36>: nop    
    0x10086ef9c <+40>: ldr    w9, 0x1008a1308           ; _MergedGlobals + 8
    0x10086efa0 <+44>: add    w20, w9, #0x1
  thread #3, stop reason = EXC_BAD_ACCESS (code=1, address=0x540)
    frame #0: 0x000000010086ef94 libomp.dylib`__kmp_suspend_initialize_thread + 32
libomp.dylib`:
->  0x10086ef94 <+32>: ldr    w8, [x0, #0x540]
    0x10086ef98 <+36>: nop    
    0x10086ef9c <+40>: ldr    w9, 0x1008a1308           ; _MergedGlobals + 8
    0x10086efa0 <+44>: add    w20, w9, #0x1
  thread #4, stop reason = EXC_BAD_ACCESS (code=1, address=0x540)
    frame #0: 0x000000010086ef94 libomp.dylib`__kmp_suspend_initialize_thread + 32
libomp.dylib`:
->  0x10086ef94 <+32>: ldr    w8, [x0, #0x540]
    0x10086ef98 <+36>: nop    
    0x10086ef9c <+40>: ldr    w9, 0x1008a1308           ; _MergedGlobals + 8
    0x10086efa0 <+44>: add    w20, w9, #0x1
  thread #5, stop reason = EXC_BAD_ACCESS (code=1, address=0x540)
    frame #0: 0x000000010086ef94 libomp.dylib`__kmp_suspend_initialize_thread + 32
libomp.dylib`:
->  0x10086ef94 <+32>: ldr    w8, [x0, #0x540]
    0x10086ef98 <+36>: nop    
    0x10086ef9c <+40>: ldr    w9, 0x1008a1308           ; _MergedGlobals + 8
    0x10086efa0 <+44>: add    w20, w9, #0x1
  thread #8, stop reason = EXC_BAD_ACCESS (code=1, address=0x540)
    frame #0: 0x000000010086ef94 libomp.dylib`__kmp_suspend_initialize_thread + 32
libomp.dylib`:
->  0x10086ef94 <+32>: ldr    w8, [x0, #0x540]
    0x10086ef98 <+36>: nop    
    0x10086ef9c <+40>: ldr    w9, 0x1008a1308           ; _MergedGlobals + 8
    0x10086efa0 <+44>: add    w20, w9, #0x1
Target 0: (python) stopped.
(lldb)

hmaarrfk commented 4 months ago

OMP_NUM_THREADS=1 python -c "import numpy; import torch; torch.zeros((1024, 1024), dtype=torch.uint8)"

seems to be ok with things.

but

OMP_NUM_THREADS=2 lldb -- python -c "import numpy; import torch; torch.zeros((1024, 1024), dtype=torch.uint8)"

recreates the segfault

backtrace:

* thread #4, stop reason = EXC_BAD_ACCESS (code=1, address=0x540)
  * frame #0: 0x000000010086ef94 libomp.dylib`__kmp_suspend_initialize_thread + 32
    frame #1: 0x000000010086faf8 libomp.dylib`void __kmp_suspend_64<false, true>(int, kmp_flag_64<false, true>*) + 72
    frame #2: 0x0000000108339520 libomp.dylib`kmp_flag_64<false, true>::wait(kmp_info*, int, void*) + 1880
    frame #3: 0x0000000108334560 libomp.dylib`__kmp_hyper_barrier_release(barrier_type, kmp_info*, int, int, int, void*) + 184
    frame #4: 0x00000001083380e8 libomp.dylib`__kmp_fork_barrier(int, int) + 628
    frame #5: 0x0000000108314e14 libomp.dylib`__kmp_launch_thread + 340
    frame #6: 0x000000010835300c libomp.dylib`__kmp_launch_worker(void*) + 280
    frame #7: 0x000000019ff6ef94 libsystem_pthread.dylib`_pthread_start + 136

* thread #2
  * frame #0: 0x000000019ff319ec libsystem_kernel.dylib`__psynch_cvwait + 8
    frame #1: 0x000000019ff6f55c libsystem_pthread.dylib`_pthread_cond_wait + 1228
    frame #2: 0x00000001001ac700 python`PyThread_acquire_lock_timed + 596
    frame #3: 0x000000010020f8ac python`acquire_timed + 312
    frame #4: 0x000000010020fb20 python`lock_PyThread_acquire_lock + 72
    frame #5: 0x0000000100065448 python`method_vectorcall_VARARGS_KEYWORDS + 488
    frame #6: 0x0000000100149540 python`call_function + 524
    frame #7: 0x0000000100144e38 python`_PyEval_EvalFrameDefault + 24900
    frame #8: 0x000000010013e364 python`_PyEval_Vector + 2036
    frame #9: 0x0000000100149540 python`call_function + 524
    frame #10: 0x0000000100144e38 python`_PyEval_EvalFrameDefault + 24900
    frame #11: 0x000000010013e364 python`_PyEval_Vector + 2036
    frame #12: 0x0000000100149540 python`call_function + 524
    frame #13: 0x0000000100144e38 python`_PyEval_EvalFrameDefault + 24900
    frame #14: 0x000000010013e364 python`_PyEval_Vector + 2036
    frame #15: 0x0000000100145658 python`_PyEval_EvalFrameDefault + 26980
    frame #16: 0x000000010013e364 python`_PyEval_Vector + 2036
    frame #17: 0x0000000100145658 python`_PyEval_EvalFrameDefault + 26980
    frame #18: 0x000000010013e364 python`_PyEval_Vector + 2036
    frame #19: 0x0000000100149540 python`call_function + 524
    frame #20: 0x0000000100144e38 python`_PyEval_EvalFrameDefault + 24900
    frame #21: 0x000000010013e364 python`_PyEval_Vector + 2036
    frame #22: 0x0000000100149540 python`call_function + 524
    frame #23: 0x0000000100144e38 python`_PyEval_EvalFrameDefault + 24900
    frame #24: 0x000000010013e364 python`_PyEval_Vector + 2036
    frame #25: 0x000000010005ad10 python`method_vectorcall + 344
    frame #26: 0x0000000100210830 python`thread_run + 180
    frame #27: 0x00000001001ac230 python`pythread_wrapper + 48
    frame #28: 0x000000019ff6ef94 libsystem_pthread.dylib`_pthread_start + 136

* thread #3
  * frame #0: 0x000000019ff319ec libsystem_kernel.dylib`__psynch_cvwait + 8
    frame #1: 0x000000019ff6f55c libsystem_pthread.dylib`_pthread_cond_wait + 1228
    frame #2: 0x00000001001ac700 python`PyThread_acquire_lock_timed + 596
    frame #3: 0x000000010abdf3f0 _queue.cpython-310-darwin.so`_queue_SimpleQueue_get_impl + 496
    frame #4: 0x000000010abdef5c _queue.cpython-310-darwin.so`_queue_SimpleQueue_get + 236
    frame #5: 0x00000001000ab37c python`cfunction_vectorcall_FASTCALL_KEYWORDS_METHOD + 140
    frame #6: 0x0000000100149540 python`call_function + 524
    frame #7: 0x0000000100145430 python`_PyEval_EvalFrameDefault + 26428
    frame #8: 0x000000010013e364 python`_PyEval_Vector + 2036
    frame #9: 0x0000000100145658 python`_PyEval_EvalFrameDefault + 26980
    frame #10: 0x000000010013e364 python`_PyEval_Vector + 2036
    frame #11: 0x0000000100149540 python`call_function + 524
    frame #12: 0x0000000100144e38 python`_PyEval_EvalFrameDefault + 24900
    frame #13: 0x000000010013e364 python`_PyEval_Vector + 2036
    frame #14: 0x0000000100149540 python`call_function + 524
    frame #15: 0x0000000100144e38 python`_PyEval_EvalFrameDefault + 24900
    frame #16: 0x000000010013e364 python`_PyEval_Vector + 2036
    frame #17: 0x000000010005ad10 python`method_vectorcall + 344
    frame #18: 0x0000000100210830 python`thread_run + 180
    frame #19: 0x00000001001ac230 python`pythread_wrapper + 48
    frame #20: 0x000000019ff6ef94 libsystem_pthread.dylib`_pthread_start + 136

* thread #1, queue = 'com.apple.main-thread'
  * frame #0: 0x000000014493ff34 libtorch_cpu.dylib`void c10::function_ref<void (char**, long long const*, long long, long long)>::callback_fn<at::native::DEFAULT::VectorizedLoop2d<at::native::(anonymous namespace)::fill_kernel(at::TensorIterator&, c10::Scalar const&)::$_2::operator()() const::'lambda'()::operator()() const::'lambda'(), at::native::(anonymous namespace)::fill_kernel(at::TensorIterator&, c10::Scalar const&)::$_2::operator()() const::'lambda'()::operator()() const::'lambda0'()>>(long, char**, long long const*, long long, long long) + 632
    frame #1: 0x0000000142351e7c libtorch_cpu.dylib`at::TensorIteratorBase::serial_for_each(c10::function_ref<void (char**, long long const*, long long, long long)>, at::Range) const + 364
    frame #2: 0x0000000142351fe8 libtorch_cpu.dylib`.omp_outlined. + 216
    frame #3: 0x0000000108371c4c libomp.dylib`__kmp_invoke_microtask + 156
    frame #4: 0x0000000108315e40 libomp.dylib`__kmp_invoke_task_func + 348
    frame #5: 0x0000000108311ac0 libomp.dylib`__kmp_fork_call + 7552
    frame #6: 0x0000000108304088 libomp.dylib`__kmpc_fork_call + 196
    frame #7: 0x0000000142351c28 libtorch_cpu.dylib`at::TensorIteratorBase::for_each(c10::function_ref<void (char**, long long const*, long long, long long)>, long long) + 432
    frame #8: 0x000000014493f190 libtorch_cpu.dylib`at::native::(anonymous namespace)::fill_kernel(at::TensorIterator&, c10::Scalar const&) + 252
    frame #9: 0x0000000142746fc0 libtorch_cpu.dylib`at::native::fill_out(at::Tensor&, c10::Scalar const&) + 764
    frame #10: 0x0000000142e56bc8 libtorch_cpu.dylib`at::_ops::fill__Scalar::call(at::Tensor&, c10::Scalar const&) + 272
    frame #11: 0x00000001427485c0 libtorch_cpu.dylib`at::native::zero_(at::Tensor&) + 676
    frame #12: 0x000000014332d958 libtorch_cpu.dylib`at::_ops::zero_::call(at::Tensor&) + 260
    frame #13: 0x0000000142a139bc libtorch_cpu.dylib`at::native::zeros_symint(c10::ArrayRef<c10::SymInt>, std::__1::optional<c10::ScalarType>, std::__1::optional<c10::Layout>, std::__1::optional<c10::Device>, std::__1::optional<bool>) + 676
    frame #14: 0x0000000142eb80d0 libtorch_cpu.dylib`at::_ops::zeros::redispatch(c10::DispatchKeySet, c10::ArrayRef<c10::SymInt>, std::__1::optional<c10::ScalarType>, std::__1::optional<c10::Layout>, std::__1::optional<c10::Device>, std::__1::optional<bool>) + 152
    frame #15: 0x0000000142eb7cb8 libtorch_cpu.dylib`at::_ops::zeros::call(c10::ArrayRef<c10::SymInt>, std::__1::optional<c10::ScalarType>, std::__1::optional<c10::Layout>, std::__1::optional<c10::Device>, std::__1::optional<bool>) + 296
    frame #16: 0x00000001098f0898 libtorch_python.dylib`torch::zeros_symint(c10::ArrayRef<c10::SymInt>, c10::TensorOptions) + 204
    frame #17: 0x00000001098a0ad0 libtorch_python.dylib`torch::autograd::THPVariable_zeros(_object*, _object*, _object*) + 2820
    frame #18: 0x00000001000aab3c python`cfunction_call + 80
    frame #19: 0x000000010005759c python`_PyObject_MakeTpCall + 612
    frame #20: 0x00000001001495d8 python`call_function + 676
    frame #21: 0x0000000100145430 python`_PyEval_EvalFrameDefault + 26428
    frame #22: 0x000000010013e364 python`_PyEval_Vector + 2036
    frame #23: 0x0000000100199398 python`run_mod + 216
    frame #24: 0x0000000100198e38 python`_PyRun_SimpleFileObject + 1260
    frame #25: 0x0000000100197e1c python`_PyRun_AnyFileObject + 240
    frame #26: 0x00000001001bc8f8 python`Py_RunMain + 2340
    frame #27: 0x00000001001bda54 python`pymain_main + 1180
    frame #28: 0x000000010000131c python`main + 56
    frame #29: 0x000000019fbe60e0 dyld`start + 2360

hmaarrfk commented 4 months ago

Seems to be related to:

hmaarrfk commented 4 months ago

I'm somewhat afraid of setting package type conda due to the fact that their commit says that it is needed for torch compile......

But I would rather avoid this disastrous failure mode for others.

I expect that many simply haven't been able to update to 2.3.0 due to some ecosystem in compatibility because of multiple ongoing migrations, but I was able to by carefully picking and choosing packages and some on my team had reported the failure last week

hmaarrfk commented 4 months ago

@conda-forge-admin please rerender

conda-forge-webservices[bot] commented 4 months ago

Hi! This is the friendly automated conda-forge-webservice.

I just wanted to let you know that I started rerendering the recipe in conda-forge/pytorch-cpu-feedstock#244.

isuruf commented 4 months ago

They seem to bundle libomp.dylib.

Following should fix this issue

diff --git a/recipe/build_pytorch.sh b/recipe/build_pytorch.sh
index cd27be0..c6cb567 100644
--- a/recipe/build_pytorch.sh
+++ b/recipe/build_pytorch.sh
@@ -1,13 +1,9 @@
 set -x
-if [[ "$megabuild" == true ]]; then
-  source $RECIPE_DIR/build.sh
-  pushd $SP_DIR/torch
-  for f in bin/* lib/* share/* include/*; do
-    if [[ -e "$PREFIX/$f" ]]; then
-      rm -rf $f
-      ln -sf $PREFIX/$f $PWD/$f
-    fi
-  done
-else
-  $PREFIX/bin/python -m pip install torch-*.whl
-fi
+source $RECIPE_DIR/build.sh
+pushd $SP_DIR/torch
+for f in bin/* lib/* share/* include/*; do
+  if [[ -e "$PREFIX/$f" ]]; then
+    rm -rf $f
+    ln -sf $PREFIX/$f $PWD/$f
+  fi
+done

hmaarrfk commented 4 months ago

@conda-forge-admin please rerender

hmaarrfk commented 4 months ago

Thank you Isuru!

conda-forge / pytorch-cpu-feedstock

Segfault on M1 Macbook #243

Solution to issue cannot be found in the documentation.

Issue

Installed packages

Environment info