apache / arrow

Apache Arrow is a multi-language toolbox for accelerated data interchange and in-memory processing
https://arrow.apache.org/
Apache License 2.0
14.34k stars 3.48k forks source link

Test failure in arrow-compute-aggregate-test #12681

Open ArchangeGabriel opened 2 years ago

ArchangeGabriel commented 2 years ago

While building for Arch Linux, I’m observing 4 tests failures in the aforementioned suite:

[ RUN      ] TestRandomInt64QuantileKernel.Overlapped
/build/arrow/src/apache-arrow-7.0.0/cpp/src/arrow/compute/kernels/aggregate_test.cc:3234: Failure
Value of: (quantiles[j] == numeric_scalar->value) || (std::isnan(quantiles[j]) && std::isnan(numeric_scalar->value))
  Actual: false
Expected: true
[  FAILED  ] TestRandomInt64QuantileKernel.Overlapped (6 ms)
[ RUN      ] TestRandomInt64QuantileKernel.Sliced
/build/arrow/src/apache-arrow-7.0.0/cpp/src/arrow/compute/kernels/aggregate_test.cc:3234: Failure
Value of: (quantiles[j] == numeric_scalar->value) || (std::isnan(quantiles[j]) && std::isnan(numeric_scalar->value))
  Actual: false
Expected: true
[  FAILED  ] TestRandomInt64QuantileKernel.Sliced (25 ms)
[ RUN      ] TestRandomFloatQuantileKernel.Exact
/build/arrow/src/apache-arrow-7.0.0/cpp/src/arrow/compute/kernels/aggregate_test.cc:3234: Failure
Value of: (quantiles[j] == numeric_scalar->value) || (std::isnan(quantiles[j]) && std::isnan(numeric_scalar->value))
  Actual: false
Expected: true
[  FAILED  ] TestRandomFloatQuantileKernel.Exact (0 ms)
[ RUN      ] TestRandomFloatQuantileKernel.Sliced
/build/arrow/src/apache-arrow-7.0.0/cpp/src/arrow/compute/kernels/aggregate_test.cc:3234: Failure
Value of: (quantiles[j] == numeric_scalar->value) || (std::isnan(quantiles[j]) && std::isnan(numeric_scalar->value))
  Actual: false
Expected: true
/build/arrow/src/apache-arrow-7.0.0/cpp/src/arrow/compute/kernels/aggregate_test.cc:3234: Failure
Value of: (quantiles[j] == numeric_scalar->value) || (std::isnan(quantiles[j]) && std::isnan(numeric_scalar->value))
  Actual: false
Expected: true
/build/arrow/src/apache-arrow-7.0.0/cpp/src/arrow/compute/kernels/aggregate_test.cc:3234: Failure
Value of: (quantiles[j] == numeric_scalar->value) || (std::isnan(quantiles[j]) && std::isnan(numeric_scalar->value))
  Actual: false
Expected: true
/build/arrow/src/apache-arrow-7.0.0/cpp/src/arrow/compute/kernels/aggregate_test.cc:3234: Failure
Value of: (quantiles[j] == numeric_scalar->value) || (std::isnan(quantiles[j]) && std::isnan(numeric_scalar->value))
  Actual: false
Expected: true
[  FAILED  ] TestRandomFloatQuantileKernel.Sliced (0 ms)

They are also happening with 6.0.1, but were not sometime ago so I suspect an update in some of arrow dependencies to be responsible for this. I’m happy to provide any information that could be useful, but I don’t want to create an account on JIRA, I try to limit the number of accounts I have everywhere.

loqs commented 2 years ago

Swapping from gcc to clang and the tests than pass. I verified the same four tests fails on my system using gcc 11.2.0.

wjones127 commented 2 years ago

We've recently had some issues with SIMD instructions seeming incorrect in certain compilers, which sounds like this might be related to. (See https://github.com/apache/arrow/pull/12422#issuecomment-1039523955). Stumped us so far, but I think we are still looking into it. cc @rok @jonkeane

loqs commented 2 years ago

Adding any of -mno-sse -mno-sse2 -mno-sse3 to CXXFLAGS and the tests pass. Replacing -DARROW_SIMD_LEVEL=AVX2 with -DARROW_SIMD_LEVEL=NONE -DARROW_RUNTIME_SIMD_LEVEL=AVX512 and the tests also pass. This is all on an AMD system that supports up to avx2.

erydit commented 2 years ago

I confirm the problem on my machine. With -DARROW_SIMD_LEVEL=AVX2 all test was failed. With -DARROW_SIMD_LEVEL=SSE4_2 the 54th test 'arrow-flight-test' failed and all other tests passed.

As I understand my CPU does not support AVX2 instructions and that was the main problem. But the test 'arrow-flight-test' is not related to that, anyway I also append Log of that test.

arrow-flight-test' Running arrow-flight-test, redirecting output into /home/rstanislav/Desktop/arrow/src/build/build/test-logs/arrow-flight-test.txt (attempt 1/1) Traceback (most recent call last): File "/home/rstanislav/Desktop/arrow/src/apache-arrow-8.0.1/cpp/build-support/asan_symbolize.py", line 368, in loop.process_stdin() File "/home/rstanislav/Desktop/arrow/src/apache-arrow-8.0.1/cpp/build-support/asan_symbolize.py", line 340, in process_stdin line = sys.stdin.readline() File "/usr/lib/python3.10/codecs.py", line 322, in decode (result, consumed) = self._buffer_decode(data, self.errors, final) UnicodeDecodeError: 'utf-8' codec can't decode byte 0xe5 in position 3112: invalid continuation byte ~/Desktop/arrow/src/build/src/arrow/flight Test time = 1.36 sec
My system: OS: Manjaro 21.3.6 Ruah Kernel: x86_64 Linux 5.15.57-2-MANJARO Uptime: 26m Packages: 1469 Shell: bash Resolution: 1920x1080 DE: Xfce4 WM: Xfwm4 WM Theme: Matcha-sea GTK Theme: Matcha-sea [GTK2] Icon Theme: Papirus-Maia Font: Noto Sans 10 Disk: 1,5T / 6,8T (23%) CPU: AMD FX-8350 Eight-Core @ 8x 4GHz GPU: AMD RS780 (DRM 2.50.0 / 5.15.57-2-MANJARO, LLVM 14.0.6) RAM: 1829MiB / 19485MiB
lscpu: Архитектура: x86_64 CPU op-mode(s): 32-bit, 64-bit Address sizes: 48 bits physical, 48 bits virtual Порядок байт: Little Endian CPU(s): 8 On-line CPU(s) list: 0-7 ID прроизводителя: AuthenticAMD Имя модели: AMD FX(tm)-8350 Eight-Core Processor Семейство ЦПУ: 21 Модель: 2 Thread(s) per core: 2 Ядер на сокет: 4 Сокетов: 1 Степпинг: 0 Frequency boost: enabled CPU(s) scaling MHz: 38% CPU max MHz: 4000,0000 CPU min MHz: 1400,0000 BogoMIPS: 8003.18 Флаги: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx f xsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb rdtscp lm constant_tsc rep_good no pl nonstop_tsc cpuid extd_apicid aperfmperf pni pclmulqdq monitor ssse3 fma cx16 sse4 _1 sse4_2 popcnt aes xsave avx f16c lahf_lm cmp_legacy svm extapic cr8_legacy abm sse 4a misalignsse 3dnowprefetch osvw ibs xop skinit wdt lwp fma4 tce nodeid_msr tbm topo ext perfctr_core perfctr_nb cpb hw_pstate ssbd ibpb vmmcall bmi1 arat npt lbrv svm_lo ck nrip_save tsc_scale vmcb_clean flushbyasid decodeassists pausefilter pfthreshold Virtualization features: Виртуализация: AMD-V Caches (sum of all): L1d: 128 KiB (8 instances) L1i: 256 KiB (4 instances) L2: 8 MiB (4 instances) L3: 8 MiB (1 instance) NUMA: NUMA node(s): 1 NUMA node0 CPU(s): 0-7 Vulnerabilities: Itlb multihit: Not affected L1tf: Not affected Mds: Not affected Meltdown: Not affected Mmio stale data: Not affected Retbleed: Mitigation; untrained return thunk; SMT vulnerable Spec store bypass: Mitigation; Speculative Store Bypass disabled via prctl and seccomp Spectre v1: Mitigation; usercopy/swapgs barriers and __user pointer sanitization Spectre v2: Mitigation; Retpolines, IBPB conditional, STIBP disabled, RSB filling Srbds: Not affected Tsx async abort: Not affected
pitrou commented 2 years ago

Hmm, it's a pity that we're not displaying the actual values being compared on failure. It's possible that the test failures are simply due to floating-point accuracy. But ideally the tests should be improved to take that into account. @cyb70289 You might want to take a look. (but @wjones127 perhaps you also want to take a deeper look and suggest improvements/fixes, especially if you manage to reproduce?)

pitrou commented 2 years ago

@erydit The GDAL is issue is unrelated, please let's not conflate these.

Also, the fact that Arrow doesn't work on an AVX2 CPU if compiled with AVX2 enabled is entirely expected. It is not a bug.