Closed hmaarrfk closed 4 months ago
Recreatable with:
Does not happen with
Cannot recreate with:
conda create --name pt python=3.10
conda activate pt
# numpy 2.0 and pytorch 2.3.1 get instaled
pip install torch numpy
python -c "import numpy; import torch; torch.zeros((1024, 1024), dtype=torch.uint8)"
# no segfault
can recreate with
conda create --name pt python=3.10 numpy
conda activate pt
pip install torch
python -c "import numpy; import torch; torch.zeros((1024, 1024), dtype=torch.uint8)"
# Segfault...
In all these cases importing numpy second does not recreate the issue
And now with lldb
% lldb -- python -c "import numpy; import torch; torch.zeros((1024, 1024), dtype=torch.uint8)"
(lldb) target create "python"
Current executable set to '/Users/mark/miniforge3/envs/pt/bin/python' (arm64).
(lldb) settings set -- target.run-args "-c" "import numpy; import torch; torch.zeros((1024, 1024), dtype=torch.uint8)"
(lldb) run
Process 58336 launched: '/Users/mark/miniforge3/envs/pt/bin/python' (arm64)
Process 58336 stopped
* thread #2, stop reason = EXC_BAD_ACCESS (code=1, address=0x540)
frame #0: 0x000000010086ef94 libomp.dylib`__kmp_suspend_initialize_thread + 32
libomp.dylib`:
-> 0x10086ef94 <+32>: ldr w8, [x0, #0x540]
0x10086ef98 <+36>: nop
0x10086ef9c <+40>: ldr w9, 0x1008a1308 ; _MergedGlobals + 8
0x10086efa0 <+44>: add w20, w9, #0x1
thread #3, stop reason = EXC_BAD_ACCESS (code=1, address=0x540)
frame #0: 0x000000010086ef94 libomp.dylib`__kmp_suspend_initialize_thread + 32
libomp.dylib`:
-> 0x10086ef94 <+32>: ldr w8, [x0, #0x540]
0x10086ef98 <+36>: nop
0x10086ef9c <+40>: ldr w9, 0x1008a1308 ; _MergedGlobals + 8
0x10086efa0 <+44>: add w20, w9, #0x1
thread #4, stop reason = EXC_BAD_ACCESS (code=1, address=0x540)
frame #0: 0x000000010086ef94 libomp.dylib`__kmp_suspend_initialize_thread + 32
libomp.dylib`:
-> 0x10086ef94 <+32>: ldr w8, [x0, #0x540]
0x10086ef98 <+36>: nop
0x10086ef9c <+40>: ldr w9, 0x1008a1308 ; _MergedGlobals + 8
0x10086efa0 <+44>: add w20, w9, #0x1
thread #5, stop reason = EXC_BAD_ACCESS (code=1, address=0x540)
frame #0: 0x000000010086ef94 libomp.dylib`__kmp_suspend_initialize_thread + 32
libomp.dylib`:
-> 0x10086ef94 <+32>: ldr w8, [x0, #0x540]
0x10086ef98 <+36>: nop
0x10086ef9c <+40>: ldr w9, 0x1008a1308 ; _MergedGlobals + 8
0x10086efa0 <+44>: add w20, w9, #0x1
thread #8, stop reason = EXC_BAD_ACCESS (code=1, address=0x540)
frame #0: 0x000000010086ef94 libomp.dylib`__kmp_suspend_initialize_thread + 32
libomp.dylib`:
-> 0x10086ef94 <+32>: ldr w8, [x0, #0x540]
0x10086ef98 <+36>: nop
0x10086ef9c <+40>: ldr w9, 0x1008a1308 ; _MergedGlobals + 8
0x10086efa0 <+44>: add w20, w9, #0x1
Target 0: (python) stopped.
(lldb)
OMP_NUM_THREADS=1 python -c "import numpy; import torch; torch.zeros((1024, 1024), dtype=torch.uint8)"
seems to be ok with things.
but
OMP_NUM_THREADS=2 lldb -- python -c "import numpy; import torch; torch.zeros((1024, 1024), dtype=torch.uint8)"
recreates the segfault
backtrace:
* thread #4, stop reason = EXC_BAD_ACCESS (code=1, address=0x540)
* frame #0: 0x000000010086ef94 libomp.dylib`__kmp_suspend_initialize_thread + 32
frame #1: 0x000000010086faf8 libomp.dylib`void __kmp_suspend_64<false, true>(int, kmp_flag_64<false, true>*) + 72
frame #2: 0x0000000108339520 libomp.dylib`kmp_flag_64<false, true>::wait(kmp_info*, int, void*) + 1880
frame #3: 0x0000000108334560 libomp.dylib`__kmp_hyper_barrier_release(barrier_type, kmp_info*, int, int, int, void*) + 184
frame #4: 0x00000001083380e8 libomp.dylib`__kmp_fork_barrier(int, int) + 628
frame #5: 0x0000000108314e14 libomp.dylib`__kmp_launch_thread + 340
frame #6: 0x000000010835300c libomp.dylib`__kmp_launch_worker(void*) + 280
frame #7: 0x000000019ff6ef94 libsystem_pthread.dylib`_pthread_start + 136
* thread #2
* frame #0: 0x000000019ff319ec libsystem_kernel.dylib`__psynch_cvwait + 8
frame #1: 0x000000019ff6f55c libsystem_pthread.dylib`_pthread_cond_wait + 1228
frame #2: 0x00000001001ac700 python`PyThread_acquire_lock_timed + 596
frame #3: 0x000000010020f8ac python`acquire_timed + 312
frame #4: 0x000000010020fb20 python`lock_PyThread_acquire_lock + 72
frame #5: 0x0000000100065448 python`method_vectorcall_VARARGS_KEYWORDS + 488
frame #6: 0x0000000100149540 python`call_function + 524
frame #7: 0x0000000100144e38 python`_PyEval_EvalFrameDefault + 24900
frame #8: 0x000000010013e364 python`_PyEval_Vector + 2036
frame #9: 0x0000000100149540 python`call_function + 524
frame #10: 0x0000000100144e38 python`_PyEval_EvalFrameDefault + 24900
frame #11: 0x000000010013e364 python`_PyEval_Vector + 2036
frame #12: 0x0000000100149540 python`call_function + 524
frame #13: 0x0000000100144e38 python`_PyEval_EvalFrameDefault + 24900
frame #14: 0x000000010013e364 python`_PyEval_Vector + 2036
frame #15: 0x0000000100145658 python`_PyEval_EvalFrameDefault + 26980
frame #16: 0x000000010013e364 python`_PyEval_Vector + 2036
frame #17: 0x0000000100145658 python`_PyEval_EvalFrameDefault + 26980
frame #18: 0x000000010013e364 python`_PyEval_Vector + 2036
frame #19: 0x0000000100149540 python`call_function + 524
frame #20: 0x0000000100144e38 python`_PyEval_EvalFrameDefault + 24900
frame #21: 0x000000010013e364 python`_PyEval_Vector + 2036
frame #22: 0x0000000100149540 python`call_function + 524
frame #23: 0x0000000100144e38 python`_PyEval_EvalFrameDefault + 24900
frame #24: 0x000000010013e364 python`_PyEval_Vector + 2036
frame #25: 0x000000010005ad10 python`method_vectorcall + 344
frame #26: 0x0000000100210830 python`thread_run + 180
frame #27: 0x00000001001ac230 python`pythread_wrapper + 48
frame #28: 0x000000019ff6ef94 libsystem_pthread.dylib`_pthread_start + 136
* thread #3
* frame #0: 0x000000019ff319ec libsystem_kernel.dylib`__psynch_cvwait + 8
frame #1: 0x000000019ff6f55c libsystem_pthread.dylib`_pthread_cond_wait + 1228
frame #2: 0x00000001001ac700 python`PyThread_acquire_lock_timed + 596
frame #3: 0x000000010abdf3f0 _queue.cpython-310-darwin.so`_queue_SimpleQueue_get_impl + 496
frame #4: 0x000000010abdef5c _queue.cpython-310-darwin.so`_queue_SimpleQueue_get + 236
frame #5: 0x00000001000ab37c python`cfunction_vectorcall_FASTCALL_KEYWORDS_METHOD + 140
frame #6: 0x0000000100149540 python`call_function + 524
frame #7: 0x0000000100145430 python`_PyEval_EvalFrameDefault + 26428
frame #8: 0x000000010013e364 python`_PyEval_Vector + 2036
frame #9: 0x0000000100145658 python`_PyEval_EvalFrameDefault + 26980
frame #10: 0x000000010013e364 python`_PyEval_Vector + 2036
frame #11: 0x0000000100149540 python`call_function + 524
frame #12: 0x0000000100144e38 python`_PyEval_EvalFrameDefault + 24900
frame #13: 0x000000010013e364 python`_PyEval_Vector + 2036
frame #14: 0x0000000100149540 python`call_function + 524
frame #15: 0x0000000100144e38 python`_PyEval_EvalFrameDefault + 24900
frame #16: 0x000000010013e364 python`_PyEval_Vector + 2036
frame #17: 0x000000010005ad10 python`method_vectorcall + 344
frame #18: 0x0000000100210830 python`thread_run + 180
frame #19: 0x00000001001ac230 python`pythread_wrapper + 48
frame #20: 0x000000019ff6ef94 libsystem_pthread.dylib`_pthread_start + 136
* thread #1, queue = 'com.apple.main-thread'
* frame #0: 0x000000014493ff34 libtorch_cpu.dylib`void c10::function_ref<void (char**, long long const*, long long, long long)>::callback_fn<at::native::DEFAULT::VectorizedLoop2d<at::native::(anonymous namespace)::fill_kernel(at::TensorIterator&, c10::Scalar const&)::$_2::operator()() const::'lambda'()::operator()() const::'lambda'(), at::native::(anonymous namespace)::fill_kernel(at::TensorIterator&, c10::Scalar const&)::$_2::operator()() const::'lambda'()::operator()() const::'lambda0'()>>(long, char**, long long const*, long long, long long) + 632
frame #1: 0x0000000142351e7c libtorch_cpu.dylib`at::TensorIteratorBase::serial_for_each(c10::function_ref<void (char**, long long const*, long long, long long)>, at::Range) const + 364
frame #2: 0x0000000142351fe8 libtorch_cpu.dylib`.omp_outlined. + 216
frame #3: 0x0000000108371c4c libomp.dylib`__kmp_invoke_microtask + 156
frame #4: 0x0000000108315e40 libomp.dylib`__kmp_invoke_task_func + 348
frame #5: 0x0000000108311ac0 libomp.dylib`__kmp_fork_call + 7552
frame #6: 0x0000000108304088 libomp.dylib`__kmpc_fork_call + 196
frame #7: 0x0000000142351c28 libtorch_cpu.dylib`at::TensorIteratorBase::for_each(c10::function_ref<void (char**, long long const*, long long, long long)>, long long) + 432
frame #8: 0x000000014493f190 libtorch_cpu.dylib`at::native::(anonymous namespace)::fill_kernel(at::TensorIterator&, c10::Scalar const&) + 252
frame #9: 0x0000000142746fc0 libtorch_cpu.dylib`at::native::fill_out(at::Tensor&, c10::Scalar const&) + 764
frame #10: 0x0000000142e56bc8 libtorch_cpu.dylib`at::_ops::fill__Scalar::call(at::Tensor&, c10::Scalar const&) + 272
frame #11: 0x00000001427485c0 libtorch_cpu.dylib`at::native::zero_(at::Tensor&) + 676
frame #12: 0x000000014332d958 libtorch_cpu.dylib`at::_ops::zero_::call(at::Tensor&) + 260
frame #13: 0x0000000142a139bc libtorch_cpu.dylib`at::native::zeros_symint(c10::ArrayRef<c10::SymInt>, std::__1::optional<c10::ScalarType>, std::__1::optional<c10::Layout>, std::__1::optional<c10::Device>, std::__1::optional<bool>) + 676
frame #14: 0x0000000142eb80d0 libtorch_cpu.dylib`at::_ops::zeros::redispatch(c10::DispatchKeySet, c10::ArrayRef<c10::SymInt>, std::__1::optional<c10::ScalarType>, std::__1::optional<c10::Layout>, std::__1::optional<c10::Device>, std::__1::optional<bool>) + 152
frame #15: 0x0000000142eb7cb8 libtorch_cpu.dylib`at::_ops::zeros::call(c10::ArrayRef<c10::SymInt>, std::__1::optional<c10::ScalarType>, std::__1::optional<c10::Layout>, std::__1::optional<c10::Device>, std::__1::optional<bool>) + 296
frame #16: 0x00000001098f0898 libtorch_python.dylib`torch::zeros_symint(c10::ArrayRef<c10::SymInt>, c10::TensorOptions) + 204
frame #17: 0x00000001098a0ad0 libtorch_python.dylib`torch::autograd::THPVariable_zeros(_object*, _object*, _object*) + 2820
frame #18: 0x00000001000aab3c python`cfunction_call + 80
frame #19: 0x000000010005759c python`_PyObject_MakeTpCall + 612
frame #20: 0x00000001001495d8 python`call_function + 676
frame #21: 0x0000000100145430 python`_PyEval_EvalFrameDefault + 26428
frame #22: 0x000000010013e364 python`_PyEval_Vector + 2036
frame #23: 0x0000000100199398 python`run_mod + 216
frame #24: 0x0000000100198e38 python`_PyRun_SimpleFileObject + 1260
frame #25: 0x0000000100197e1c python`_PyRun_AnyFileObject + 240
frame #26: 0x00000001001bc8f8 python`Py_RunMain + 2340
frame #27: 0x00000001001bda54 python`pymain_main + 1180
frame #28: 0x000000010000131c python`main + 56
frame #29: 0x000000019fbe60e0 dyld`start + 2360
I'm somewhat afraid of setting package type conda due to the fact that their commit says that it is needed for torch compile......
But I would rather avoid this disastrous failure mode for others.
I expect that many simply haven't been able to update to 2.3.0 due to some ecosystem in compatibility because of multiple ongoing migrations, but I was able to by carefully picking and choosing packages and some on my team had reported the failure last week
@conda-forge-admin please rerender
Hi! This is the friendly automated conda-forge-webservice.
I just wanted to let you know that I started rerendering the recipe in conda-forge/pytorch-cpu-feedstock#244.
They seem to bundle libomp.dylib
.
Following should fix this issue
diff --git a/recipe/build_pytorch.sh b/recipe/build_pytorch.sh
index cd27be0..c6cb567 100644
--- a/recipe/build_pytorch.sh
+++ b/recipe/build_pytorch.sh
@@ -1,13 +1,9 @@
set -x
-if [[ "$megabuild" == true ]]; then
- source $RECIPE_DIR/build.sh
- pushd $SP_DIR/torch
- for f in bin/* lib/* share/* include/*; do
- if [[ -e "$PREFIX/$f" ]]; then
- rm -rf $f
- ln -sf $PREFIX/$f $PWD/$f
- fi
- done
-else
- $PREFIX/bin/python -m pip install torch-*.whl
-fi
+source $RECIPE_DIR/build.sh
+pushd $SP_DIR/torch
+for f in bin/* lib/* share/* include/*; do
+ if [[ -e "$PREFIX/$f" ]]; then
+ rm -rf $f
+ ln -sf $PREFIX/$f $PWD/$f
+ fi
+done
@conda-forge-admin please rerender
Thank you Isuru!
Solution to issue cannot be found in the documentation.
Issue
Installed packages
Environment info