apache / arrow

Apache Arrow is a multi-language toolbox for accelerated data interchange and in-memory processing
https://arrow.apache.org/
Apache License 2.0
13.96k stars 3.4k forks source link

[C++] cpp-micro benchmarks are failing on mac arm machine #20477

Closed asfimport closed 1 year ago

asfimport commented 1 year ago

Reporter: Elena Henderson / @ElenaHenderson Assignee: Weston Pace / @westonpace

PRs and other links:

Note: This issue was originally created as ARROW-18183. Please see the migration documentation for further details.

asfimport commented 1 year ago

Elena Henderson / @ElenaHenderson: cpp-micro benchmarks started failing on test-mac-arm since this change: https://github.com/apache/arrow/commit/ec579df631deaa8f6186208ed2a4ebec00581dfa

First failed build: https://buildkite.com/apache-arrow/arrow-bci-benchmark-on-test-mac-arm/builds/1628

 

Steps to reproduce:

  1. Run benchmarks for commit https://github.com/apache/arrow/commit/89c0214fa43f8d1bf2e19e3bae0fc3009df51e15, which was made right before  https://github.com/apache/arrow/commit/ec579df631deaa8f6186208ed2a4ebec00581dfa

cd ~ rm -rf arrow git clone https://github.com/apache/arrow.git cd arrow git checkout 89c0214fa43f8d1bf2e19e3bae0fc3009df51e15 conda create -y -n arrow-commit -c conda-forge \   --file ci/conda_env_unix.txt \   --file ci/conda_env_cpp.txt \   --file ci/conda_env_python.txt \   compilers \   python=3.8 \   pandas conda activate arrow-commit pip install -r python/requirements-build.txt -r python/requirements-test.txt source dev/conbench_envs/hooks.sh set_arrow_build_and_run_env_vars export RANLIB=which $RANLIB export AR=which $AR export ARROW_JEMALLOC=OFF pip install -e dev/archery archery benchmark run

 

 

Note benchmarks are NOT failing!

 

2. Run benchmarks for <https://github.com/apache/arrow/commit/ec579df631deaa8f6186208ed2a4ebec00581dfa> when cpp-micro benchmarks started failing:

 
```java

cd ~
rm -rf arrow
git clone https://github.com/apache/arrow.git
cd arrow
git checkout ec579df631deaa8f6186208ed2a4ebec00581dfa
conda create -y -n arrow-commit -c conda-forge \
  --file ci/conda_env_unix.txt \
  --file ci/conda_env_cpp.txt \
  --file ci/conda_env_python.txt \
  compilers \
  python=3.8 \
  pandas
conda activate arrow-commit
pip install -r python/requirements-build.txt -r python/requirements-test.txt
source dev/conbench_envs/hooks.sh set_arrow_build_and_run_env_vars
export RANLIB=`which $RANLIB`
export AR=`which $AR`
export ARROW_JEMALLOC=OFF
pip install -e dev/archery
archery benchmark run

 

Note benchmarks stop running:

 


...
CoalesceScalarStringBench/6                 6174977 ns      6174982 ns          111 bytes_per_second=664.007M/s items_per_second=169.81M/s length=1048.58k null%=99 num_args=2
ChooseBench64/1048576/0                     6787276 ns      6787272 ns          103 bytes_per_second=1.15105G/s items_per_second=154.492M/s
ChooseBench64/1048576/99                    6722603 ns      6722606 ns          104 bytes_per_second=1.16201G/s items_per_second=155.963M/s
/Users/voltrondata/arrow/cpp/src/arrow/memory_pool.cc:113: Unsupported backend 'mimalloc' specified in ARROW_DEFAULT_MEMORY_POOL (supported backends are 'system')
Unable to determine clock rate from sysctl: hw.cpufrequency: No such file or directory
This does not affect benchmark measurements, only the metadata output.
2022-10-27T18:58:59-04:00
Running /var/folders/d9/n0lbzbf16nvgt7b83xhcssm80000gn/T/arrow-archery-18pv4wou/WORKSPACE/build/release/arrow-dataset-scanner-benchmark
Run on (8 X 24.2407 MHz CPU s)
CPU Caches:
  L1 Data 64 KiB
  L1 Instruction 128 KiB
  L2 Unified 4096 KiB (x8)
Load Average: 15.79, 15.29, 10.64
---------------------------------------------------------------------------------------------------------------------------------------------------
Benchmark                                                                                         Time             CPU   Iterations UserCounters...
---------------------------------------------------------------------------------------------------------------------------------------------------
MinimalEndToEndBench/num_batches:1000/batch_size:10/scan_alg:0/iterations:10/real_time     23472042 ns      5977400 ns           10 bytes_per_second=2.03151M/s items_per_second=42.6039k/s
Traceback (most recent call last):
  File "/Users/voltrondata/miniconda3/envs/arrow-commit/bin/archery", line 33, in <module>
    sys.exit(load_entry_point('archery', 'console_scripts', 'archery')())
  File "/Users/voltrondata/miniconda3/envs/arrow-commit/lib/python3.8/site-packages/click/core.py", line 1130, in _call_
    return self.main(*args, **kwargs)
  File "/Users/voltrondata/miniconda3/envs/arrow-commit/lib/python3.8/site-packages/click/core.py", line 1055, in main
    rv = self.invoke(ctx)
  File "/Users/voltrondata/miniconda3/envs/arrow-commit/lib/python3.8/site-packages/click/core.py", line 1657, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/Users/voltrondata/miniconda3/envs/arrow-commit/lib/python3.8/site-packages/click/core.py", line 1657, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/Users/voltrondata/miniconda3/envs/arrow-commit/lib/python3.8/site-packages/click/core.py", line 1404, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/Users/voltrondata/miniconda3/envs/arrow-commit/lib/python3.8/site-packages/click/core.py", line 760, in invoke
    return __callback(*args, **kwargs)
  File "/Users/voltrondata/miniconda3/envs/arrow-commit/lib/python3.8/site-packages/click/decorators.py", line 26, in new_func
    return f(get_current_context(), *args, **kwargs)
  File "/Users/voltrondata/arrow/dev/archery/archery/cli.py", line 515, in benchmark_run
    json_out = json.dumps(runner_base, cls=JsonEncoder)
  File "/Users/voltrondata/miniconda3/envs/arrow-commit/lib/python3.8/json/_init_.py", line 234, in dumps
    return cls(
  File "/Users/voltrondata/miniconda3/envs/arrow-commit/lib/python3.8/json/encoder.py", line 199, in encode
    chunks = self.iterencode(o, _one_shot=True)
  File "/Users/voltrondata/miniconda3/envs/arrow-commit/lib/python3.8/json/encoder.py", line 257, in iterencode
    return _iterencode(o, 0)
  File "/Users/voltrondata/arrow/dev/archery/archery/benchmark/codec.py", line 35, in default
    return BenchmarkRunnerCodec.encode(o)
  File "/Users/voltrondata/arrow/dev/archery/archery/benchmark/codec.py", line 79, in encode
    return {"suites": [BenchmarkSuiteCodec.encode(s) for s in br.suites]}
  File "/Users/voltrondata/arrow/dev/archery/archery/benchmark/codec.py", line 79, in <listcomp>
    return {"suites": [BenchmarkSuiteCodec.encode(s) for s in br.suites]}
  File "/Users/voltrondata/arrow/dev/archery/archery/benchmark/runner.py", line 176, in suites
    suite = self.suite(suite_name, suite_bin)
  File "/Users/voltrondata/arrow/dev/archery/archery/benchmark/runner.py", line 152, in suite
    results = suite_cmd.results(repetitions=self.repetitions)
  File "/Users/voltrondata/arrow/dev/archery/archery/benchmark/google.py", line 63, in results
    self.run(*argv, check=True)
  File "/Users/voltrondata/arrow/dev/archery/archery/utils/command.py", line 78, in run
    return subprocess.run(invocation, **kwargs)
  File "/Users/voltrondata/miniconda3/envs/arrow-commit/lib/python3.8/subprocess.py", line 516, in run
    raise CalledProcessError(retcode, process.args,
subprocess.CalledProcessError: Command '['/var/folders/d9/n0lbzbf16nvgt7b83xhcssm80000gn/T/arrow-archery-18pv4wou/WORKSPACE/build/release/arrow-dataset-scanner-benchmark', '--benchmark_repetitions=1', '--benchmark_out=/var/folders/d9/n0lbzbf16nvgt7b83xhcssm80000gn/T/tmpzwwjwb94', '--benchmark_out_format=json']' died with <Signals.SIGBUS: 10>.

 

asfimport commented 1 year ago

Yibo Cai / @cyb70289: Tested on M1, all "arrow-dataset-scanner-benchmark/scan_alg=1" tests failed with SIGBUS. "scan_alg=0" tests are okay. Stack depth is approaching 4000 from the backtrace. Looks there are a call loop among {future,async_util}.{h,cc}. ASAN identified stack overflow, logs attached. cc @westonpace


# all scan_alg:0 tests are okay, all scan_alg:1 tests cause sigbus
% debug/arrow-dataset-scanner-benchmark --benchmark_filter=".*scan_alg:1.*" 

/Users/linux/cyb/arrow/cpp/src/arrow/memory_pool.cc:113: Unsupported backend 'mimalloc' specified in ARROW_DEFAULT_MEMORY_POOL (supported backends are 'jemalloc', 'system')
Unable to determine clock rate from sysctl: hw.cpufrequency: No such file or directory
This does not affect benchmark measurements, only the metadata output.
2022-11-01T17:02:15+08:00
Running debug/arrow-dataset-scanner-benchmark
Run on (8 X 24.2408 MHz CPU s)
CPU Caches:
  L1 Data 64 KiB
  L1 Instruction 128 KiB
  L2 Unified 4096 KiB (x8)
Load Average: 2.06, 2.81, 2.72
AddressSanitizer:DEADLYSIGNAL
=================================================================
==75674==ERROR: AddressSanitizer: stack-overflow on address 0x00016b9b3fc0 (pc 0x000106b4b3b4 bp 0x000106b4b3a0 sp 0x00016b9b3fa0 T1)
    #0 0x106b4b3b4 in __sanitizer::StackDepotBase<__sanitizer::StackDepotNode, 1, 20>::Put(__sanitizer::StackTrace, bool*)+0x4 (libclang_rt.asan_osx_dynamic.dylib:arm64+0x5f3b4)

SUMMARY: AddressSanitizer: stack-overflow (libclang_rt.asan_osx_dynamic.dylib:arm64+0x5f3b4) in __sanitizer::StackDepotBase<__sanitizer::StackDepotNode, 1, 20>::Put(__sanitizer::StackTrace, bool*)+0x4
Thread T1 created by T0 here:
    #0 0x106b2680c in wrap_pthread_create+0x50 (libclang_rt.asan_osx_dynamic.dylib:arm64+0x3a80c)
    #1 0x113b7a408 in std::__1::__libcpp_thread_create(_opaque_pthread_t**, void* (*)(void*), void*) __threading_support:375
    #2 0x113b7a128 in std::__1::thread::thread<arrow::internal::ThreadPool::LaunchWorkersUnlocked(int)::$_3, void>(arrow::internal::ThreadPool::LaunchWorkersUnlocked(int)::$_3&&) thread:309
    #3 0x113b67e94 in std::__1::thread::thread<arrow::internal::ThreadPool::LaunchWorkersUnlocked(int)::$_3, void>(arrow::internal::ThreadPool::LaunchWorkersUnlocked(int)::$_3&&) thread:301
    #4 0x113b66794 in arrow::internal::ThreadPool::LaunchWorkersUnlocked(int) thread_pool.cc:412
    #5 0x113b68444 in arrow::internal::ThreadPool::SpawnReal(arrow::internal::TaskHints, arrow::internal::FnOnce<void ()>, arrow::StopToken, arrow::internal::FnOnce<void (arrow::Status const&)>&&) thread_pool.cc:448
    #6 0x10488dfd8 in arrow::Result<arrow::Future<std::__1::shared_ptr<arrow::Iterator<std::__1::shared_ptr<arrow::dataset::Fragment> > > > > arrow::internal::Executor::Submit<arrow::dataset::(anonymous namespace)::GetFragments(arrow::dataset::Dataset*, arrow::compute::Expression)::$_0, arrow::Future<std::__1::shared_ptr<arrow::Iterator<std::__1::shared_ptr<arrow::dataset::Fragment> > > > >(arrow::internal::TaskHints, arrow::StopToken, arrow::dataset::(anonymous namespace)::GetFragments(arrow::dataset::Dataset*, arrow::compute::Expression)::$_0&&) thread_pool.h:167
    #7 0x10488be74 in arrow::Result<arrow::Future<std::__1::shared_ptr<arrow::Iterator<std::__1::shared_ptr<arrow::dataset::Fragment> > > > > arrow::internal::Executor::Submit<arrow::dataset::(anonymous namespace)::GetFragments(arrow::dataset::Dataset*, arrow::compute::Expression)::$_0, arrow::Future<std::__1::shared_ptr<arrow::Iterator<std::__1::shared_ptr<arrow::dataset::Fragment> > > > >(arrow::dataset::(anonymous namespace)::GetFragments(arrow::dataset::Dataset*, arrow::compute::Expression)::$_0&&) thread_pool.h:193
    #8 0x10488ac0c in arrow::dataset::(anonymous namespace)::GetFragments(arrow::dataset::Dataset*, arrow::compute::Expression) scan_node.cc:64
    #9 0x10488a010 in arrow::dataset::(anonymous namespace)::ScanNode::StartProducing() scan_node.cc:318
    #10 0x113fc43e0 in arrow::compute::(anonymous namespace)::ExecPlanImpl::StartProducing() exec_plan.cc:183
    #11 0x113fc362c in arrow::compute::ExecPlan::StartProducing() exec_plan.cc:400
    #12 0x104462260 in arrow::dataset::MinimalEndToEndScan(unsigned long, unsigned long, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&, std::__1::function<arrow::Result<std::__1::shared_ptr<arrow::compute::ExecNodeOptions> > (unsigned long, unsigned long)>) scanner_benchmark.cc:159
    #13 0x104468ebc in arrow::dataset::MinimalEndToEndBench(benchmark::State&) scanner_benchmark.cc:272
    #14 0x1055dbc8c in benchmark::internal::BenchmarkInstance::Run(long long, int, benchmark::internal::ThreadTimer*, benchmark::internal::ThreadManager*, benchmark::internal::PerfCountersMeasurement*) const+0x44 (libbenchmark.1.7.0.dylib:arm64+0xbc8c)
    #15 0x1055ed708 in benchmark::internal::(anonymous namespace)::RunInThread(benchmark::internal::BenchmarkInstance const*, long long, int, benchmark::internal::ThreadManager*, benchmark::internal::PerfCountersMeasurement*)+0x58 (libbenchmark.1.7.0.dylib:arm64+0x1d708)
    #16 0x1055ed2c8 in benchmark::internal::BenchmarkRunner::DoNIterations()+0x2c0 (libbenchmark.1.7.0.dylib:arm64+0x1d2c8)
    #17 0x1055edfec in benchmark::internal::BenchmarkRunner::DoOneRepetition()+0xb0 (libbenchmark.1.7.0.dylib:arm64+0x1dfec)
    #18 0x1055d4fb8 in benchmark::RunSpecifiedBenchmarks(benchmark::BenchmarkReporter*, benchmark::BenchmarkReporter*, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> >)+0x9f0 (libbenchmark.1.7.0.dylib:arm64+0x4fb8)
    #19 0x1055d4564 in benchmark::RunSpecifiedBenchmarks()+0x3c (libbenchmark.1.7.0.dylib:arm64+0x4564)
    #20 0x10556becc in main+0x50 (libbenchmark_main.1.7.0.dylib:arm64+0x3ecc)
    #21 0x10562d0f0 in start+0x204 (dyld:arm64+0x50f0)
    #22 0xa25d7ffffffffffc  (<unknown module>)

==75674==ABORTING
zsh: abort      debug/arrow-dataset-scanner-benchmark --benchmark_filter=".*scan_alg:1.*"
asfimport commented 1 year ago

Yibo Cai / @cyb70289: I tried setting ARROW_DEFAULT_MEMORY_POOL to "jemalloc" and "system", same error happens.

asfimport commented 1 year ago

Weston Pace / @westonpace: Thank you. I will look at this today.

asfimport commented 1 year ago

Weston Pace / @westonpace: Issue resolved by pull request 14562 https://github.com/apache/arrow/pull/14562