Closed asfimport closed 1 year ago
Elena Henderson / @ElenaHenderson:
cpp-micro benchmarks started failing on test-mac-arm
since this change: https://github.com/apache/arrow/commit/ec579df631deaa8f6186208ed2a4ebec00581dfa
First failed build: https://buildkite.com/apache-arrow/arrow-bci-benchmark-on-test-mac-arm/builds/1628
Steps to reproduce:
cd ~
rm -rf arrow
git clone https://github.com/apache/arrow.git
cd arrow
git checkout 89c0214fa43f8d1bf2e19e3bae0fc3009df51e15
conda create -y -n arrow-commit -c conda-forge \
--file ci/conda_env_unix.txt \
--file ci/conda_env_cpp.txt \
--file ci/conda_env_python.txt \
compilers \
python=3.8 \
pandas
conda activate arrow-commit
pip install -r python/requirements-build.txt -r python/requirements-test.txt
source dev/conbench_envs/hooks.sh set_arrow_build_and_run_env_vars
export RANLIB=which $RANLIB
export AR=which $AR
export ARROW_JEMALLOC=OFF
pip install -e dev/archery
archery benchmark run
Note benchmarks are NOT failing!
2. Run benchmarks for <https://github.com/apache/arrow/commit/ec579df631deaa8f6186208ed2a4ebec00581dfa> when cpp-micro benchmarks started failing:
```java
cd ~
rm -rf arrow
git clone https://github.com/apache/arrow.git
cd arrow
git checkout ec579df631deaa8f6186208ed2a4ebec00581dfa
conda create -y -n arrow-commit -c conda-forge \
--file ci/conda_env_unix.txt \
--file ci/conda_env_cpp.txt \
--file ci/conda_env_python.txt \
compilers \
python=3.8 \
pandas
conda activate arrow-commit
pip install -r python/requirements-build.txt -r python/requirements-test.txt
source dev/conbench_envs/hooks.sh set_arrow_build_and_run_env_vars
export RANLIB=`which $RANLIB`
export AR=`which $AR`
export ARROW_JEMALLOC=OFF
pip install -e dev/archery
archery benchmark run
Note benchmarks stop running:
...
CoalesceScalarStringBench/6 6174977 ns 6174982 ns 111 bytes_per_second=664.007M/s items_per_second=169.81M/s length=1048.58k null%=99 num_args=2
ChooseBench64/1048576/0 6787276 ns 6787272 ns 103 bytes_per_second=1.15105G/s items_per_second=154.492M/s
ChooseBench64/1048576/99 6722603 ns 6722606 ns 104 bytes_per_second=1.16201G/s items_per_second=155.963M/s
/Users/voltrondata/arrow/cpp/src/arrow/memory_pool.cc:113: Unsupported backend 'mimalloc' specified in ARROW_DEFAULT_MEMORY_POOL (supported backends are 'system')
Unable to determine clock rate from sysctl: hw.cpufrequency: No such file or directory
This does not affect benchmark measurements, only the metadata output.
2022-10-27T18:58:59-04:00
Running /var/folders/d9/n0lbzbf16nvgt7b83xhcssm80000gn/T/arrow-archery-18pv4wou/WORKSPACE/build/release/arrow-dataset-scanner-benchmark
Run on (8 X 24.2407 MHz CPU s)
CPU Caches:
L1 Data 64 KiB
L1 Instruction 128 KiB
L2 Unified 4096 KiB (x8)
Load Average: 15.79, 15.29, 10.64
---------------------------------------------------------------------------------------------------------------------------------------------------
Benchmark Time CPU Iterations UserCounters...
---------------------------------------------------------------------------------------------------------------------------------------------------
MinimalEndToEndBench/num_batches:1000/batch_size:10/scan_alg:0/iterations:10/real_time 23472042 ns 5977400 ns 10 bytes_per_second=2.03151M/s items_per_second=42.6039k/s
Traceback (most recent call last):
File "/Users/voltrondata/miniconda3/envs/arrow-commit/bin/archery", line 33, in <module>
sys.exit(load_entry_point('archery', 'console_scripts', 'archery')())
File "/Users/voltrondata/miniconda3/envs/arrow-commit/lib/python3.8/site-packages/click/core.py", line 1130, in _call_
return self.main(*args, **kwargs)
File "/Users/voltrondata/miniconda3/envs/arrow-commit/lib/python3.8/site-packages/click/core.py", line 1055, in main
rv = self.invoke(ctx)
File "/Users/voltrondata/miniconda3/envs/arrow-commit/lib/python3.8/site-packages/click/core.py", line 1657, in invoke
return _process_result(sub_ctx.command.invoke(sub_ctx))
File "/Users/voltrondata/miniconda3/envs/arrow-commit/lib/python3.8/site-packages/click/core.py", line 1657, in invoke
return _process_result(sub_ctx.command.invoke(sub_ctx))
File "/Users/voltrondata/miniconda3/envs/arrow-commit/lib/python3.8/site-packages/click/core.py", line 1404, in invoke
return ctx.invoke(self.callback, **ctx.params)
File "/Users/voltrondata/miniconda3/envs/arrow-commit/lib/python3.8/site-packages/click/core.py", line 760, in invoke
return __callback(*args, **kwargs)
File "/Users/voltrondata/miniconda3/envs/arrow-commit/lib/python3.8/site-packages/click/decorators.py", line 26, in new_func
return f(get_current_context(), *args, **kwargs)
File "/Users/voltrondata/arrow/dev/archery/archery/cli.py", line 515, in benchmark_run
json_out = json.dumps(runner_base, cls=JsonEncoder)
File "/Users/voltrondata/miniconda3/envs/arrow-commit/lib/python3.8/json/_init_.py", line 234, in dumps
return cls(
File "/Users/voltrondata/miniconda3/envs/arrow-commit/lib/python3.8/json/encoder.py", line 199, in encode
chunks = self.iterencode(o, _one_shot=True)
File "/Users/voltrondata/miniconda3/envs/arrow-commit/lib/python3.8/json/encoder.py", line 257, in iterencode
return _iterencode(o, 0)
File "/Users/voltrondata/arrow/dev/archery/archery/benchmark/codec.py", line 35, in default
return BenchmarkRunnerCodec.encode(o)
File "/Users/voltrondata/arrow/dev/archery/archery/benchmark/codec.py", line 79, in encode
return {"suites": [BenchmarkSuiteCodec.encode(s) for s in br.suites]}
File "/Users/voltrondata/arrow/dev/archery/archery/benchmark/codec.py", line 79, in <listcomp>
return {"suites": [BenchmarkSuiteCodec.encode(s) for s in br.suites]}
File "/Users/voltrondata/arrow/dev/archery/archery/benchmark/runner.py", line 176, in suites
suite = self.suite(suite_name, suite_bin)
File "/Users/voltrondata/arrow/dev/archery/archery/benchmark/runner.py", line 152, in suite
results = suite_cmd.results(repetitions=self.repetitions)
File "/Users/voltrondata/arrow/dev/archery/archery/benchmark/google.py", line 63, in results
self.run(*argv, check=True)
File "/Users/voltrondata/arrow/dev/archery/archery/utils/command.py", line 78, in run
return subprocess.run(invocation, **kwargs)
File "/Users/voltrondata/miniconda3/envs/arrow-commit/lib/python3.8/subprocess.py", line 516, in run
raise CalledProcessError(retcode, process.args,
subprocess.CalledProcessError: Command '['/var/folders/d9/n0lbzbf16nvgt7b83xhcssm80000gn/T/arrow-archery-18pv4wou/WORKSPACE/build/release/arrow-dataset-scanner-benchmark', '--benchmark_repetitions=1', '--benchmark_out=/var/folders/d9/n0lbzbf16nvgt7b83xhcssm80000gn/T/tmpzwwjwb94', '--benchmark_out_format=json']' died with <Signals.SIGBUS: 10>.
Yibo Cai / @cyb70289: Tested on M1, all "arrow-dataset-scanner-benchmark/scan_alg=1" tests failed with SIGBUS. "scan_alg=0" tests are okay. Stack depth is approaching 4000 from the backtrace. Looks there are a call loop among {future,async_util}.{h,cc}. ASAN identified stack overflow, logs attached. cc @westonpace
# all scan_alg:0 tests are okay, all scan_alg:1 tests cause sigbus
% debug/arrow-dataset-scanner-benchmark --benchmark_filter=".*scan_alg:1.*"
/Users/linux/cyb/arrow/cpp/src/arrow/memory_pool.cc:113: Unsupported backend 'mimalloc' specified in ARROW_DEFAULT_MEMORY_POOL (supported backends are 'jemalloc', 'system')
Unable to determine clock rate from sysctl: hw.cpufrequency: No such file or directory
This does not affect benchmark measurements, only the metadata output.
2022-11-01T17:02:15+08:00
Running debug/arrow-dataset-scanner-benchmark
Run on (8 X 24.2408 MHz CPU s)
CPU Caches:
L1 Data 64 KiB
L1 Instruction 128 KiB
L2 Unified 4096 KiB (x8)
Load Average: 2.06, 2.81, 2.72
AddressSanitizer:DEADLYSIGNAL
=================================================================
==75674==ERROR: AddressSanitizer: stack-overflow on address 0x00016b9b3fc0 (pc 0x000106b4b3b4 bp 0x000106b4b3a0 sp 0x00016b9b3fa0 T1)
#0 0x106b4b3b4 in __sanitizer::StackDepotBase<__sanitizer::StackDepotNode, 1, 20>::Put(__sanitizer::StackTrace, bool*)+0x4 (libclang_rt.asan_osx_dynamic.dylib:arm64+0x5f3b4)
SUMMARY: AddressSanitizer: stack-overflow (libclang_rt.asan_osx_dynamic.dylib:arm64+0x5f3b4) in __sanitizer::StackDepotBase<__sanitizer::StackDepotNode, 1, 20>::Put(__sanitizer::StackTrace, bool*)+0x4
Thread T1 created by T0 here:
#0 0x106b2680c in wrap_pthread_create+0x50 (libclang_rt.asan_osx_dynamic.dylib:arm64+0x3a80c)
#1 0x113b7a408 in std::__1::__libcpp_thread_create(_opaque_pthread_t**, void* (*)(void*), void*) __threading_support:375
#2 0x113b7a128 in std::__1::thread::thread<arrow::internal::ThreadPool::LaunchWorkersUnlocked(int)::$_3, void>(arrow::internal::ThreadPool::LaunchWorkersUnlocked(int)::$_3&&) thread:309
#3 0x113b67e94 in std::__1::thread::thread<arrow::internal::ThreadPool::LaunchWorkersUnlocked(int)::$_3, void>(arrow::internal::ThreadPool::LaunchWorkersUnlocked(int)::$_3&&) thread:301
#4 0x113b66794 in arrow::internal::ThreadPool::LaunchWorkersUnlocked(int) thread_pool.cc:412
#5 0x113b68444 in arrow::internal::ThreadPool::SpawnReal(arrow::internal::TaskHints, arrow::internal::FnOnce<void ()>, arrow::StopToken, arrow::internal::FnOnce<void (arrow::Status const&)>&&) thread_pool.cc:448
#6 0x10488dfd8 in arrow::Result<arrow::Future<std::__1::shared_ptr<arrow::Iterator<std::__1::shared_ptr<arrow::dataset::Fragment> > > > > arrow::internal::Executor::Submit<arrow::dataset::(anonymous namespace)::GetFragments(arrow::dataset::Dataset*, arrow::compute::Expression)::$_0, arrow::Future<std::__1::shared_ptr<arrow::Iterator<std::__1::shared_ptr<arrow::dataset::Fragment> > > > >(arrow::internal::TaskHints, arrow::StopToken, arrow::dataset::(anonymous namespace)::GetFragments(arrow::dataset::Dataset*, arrow::compute::Expression)::$_0&&) thread_pool.h:167
#7 0x10488be74 in arrow::Result<arrow::Future<std::__1::shared_ptr<arrow::Iterator<std::__1::shared_ptr<arrow::dataset::Fragment> > > > > arrow::internal::Executor::Submit<arrow::dataset::(anonymous namespace)::GetFragments(arrow::dataset::Dataset*, arrow::compute::Expression)::$_0, arrow::Future<std::__1::shared_ptr<arrow::Iterator<std::__1::shared_ptr<arrow::dataset::Fragment> > > > >(arrow::dataset::(anonymous namespace)::GetFragments(arrow::dataset::Dataset*, arrow::compute::Expression)::$_0&&) thread_pool.h:193
#8 0x10488ac0c in arrow::dataset::(anonymous namespace)::GetFragments(arrow::dataset::Dataset*, arrow::compute::Expression) scan_node.cc:64
#9 0x10488a010 in arrow::dataset::(anonymous namespace)::ScanNode::StartProducing() scan_node.cc:318
#10 0x113fc43e0 in arrow::compute::(anonymous namespace)::ExecPlanImpl::StartProducing() exec_plan.cc:183
#11 0x113fc362c in arrow::compute::ExecPlan::StartProducing() exec_plan.cc:400
#12 0x104462260 in arrow::dataset::MinimalEndToEndScan(unsigned long, unsigned long, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&, std::__1::function<arrow::Result<std::__1::shared_ptr<arrow::compute::ExecNodeOptions> > (unsigned long, unsigned long)>) scanner_benchmark.cc:159
#13 0x104468ebc in arrow::dataset::MinimalEndToEndBench(benchmark::State&) scanner_benchmark.cc:272
#14 0x1055dbc8c in benchmark::internal::BenchmarkInstance::Run(long long, int, benchmark::internal::ThreadTimer*, benchmark::internal::ThreadManager*, benchmark::internal::PerfCountersMeasurement*) const+0x44 (libbenchmark.1.7.0.dylib:arm64+0xbc8c)
#15 0x1055ed708 in benchmark::internal::(anonymous namespace)::RunInThread(benchmark::internal::BenchmarkInstance const*, long long, int, benchmark::internal::ThreadManager*, benchmark::internal::PerfCountersMeasurement*)+0x58 (libbenchmark.1.7.0.dylib:arm64+0x1d708)
#16 0x1055ed2c8 in benchmark::internal::BenchmarkRunner::DoNIterations()+0x2c0 (libbenchmark.1.7.0.dylib:arm64+0x1d2c8)
#17 0x1055edfec in benchmark::internal::BenchmarkRunner::DoOneRepetition()+0xb0 (libbenchmark.1.7.0.dylib:arm64+0x1dfec)
#18 0x1055d4fb8 in benchmark::RunSpecifiedBenchmarks(benchmark::BenchmarkReporter*, benchmark::BenchmarkReporter*, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> >)+0x9f0 (libbenchmark.1.7.0.dylib:arm64+0x4fb8)
#19 0x1055d4564 in benchmark::RunSpecifiedBenchmarks()+0x3c (libbenchmark.1.7.0.dylib:arm64+0x4564)
#20 0x10556becc in main+0x50 (libbenchmark_main.1.7.0.dylib:arm64+0x3ecc)
#21 0x10562d0f0 in start+0x204 (dyld:arm64+0x50f0)
#22 0xa25d7ffffffffffc (<unknown module>)
==75674==ABORTING
zsh: abort debug/arrow-dataset-scanner-benchmark --benchmark_filter=".*scan_alg:1.*"
Yibo Cai / @cyb70289: I tried setting ARROW_DEFAULT_MEMORY_POOL to "jemalloc" and "system", same error happens.
Weston Pace / @westonpace: Thank you. I will look at this today.
Weston Pace / @westonpace: Issue resolved by pull request 14562 https://github.com/apache/arrow/pull/14562
Reporter: Elena Henderson / @ElenaHenderson Assignee: Weston Pace / @westonpace
PRs and other links:
Note: This issue was originally created as ARROW-18183. Please see the migration documentation for further details.