apache / arrow

Apache Arrow is a multi-language toolbox for accelerated data interchange and in-memory processing
https://arrow.apache.org/
Apache License 2.0
14.28k stars 3.47k forks source link

[Python] parquet.read_table causes crashes on Windows Server 2016 w/ Xeon Processor #25432

Open asfimport opened 4 years ago

asfimport commented 4 years ago

Call to read_all() crashes immediately with a windows fatal error on version 0.16.0, 0.17.0, 0.17.1.  Downgrading to 0.15.1 fixes the problem.

Python scripts work fine on all other PCs, but on Server w/ Windows Server 2016 and Xeon processor, it crashes immediately.

ERROR:

Windows fatal exception: code 0xc000001d

Current thread 0x00001950 (most recent call first):   File "D:\XXXXXXXXXXXXXXXXXXXXXXXXX\lib\site-packages\pyarrow\parquet.py", line 253 in read   File "D:\XXXXXXXXXXXXXXXXXXXXXXXXX\lib\site-packages\pyarrow\parquet.py", line 605 in read   File "D:\XXXXXXXXXXXXXXXXXXXXXXXXX\lib\site-packages\pyarrow\parquet.py", line 1137 in read   File "D:\XXXXXXXXXXXXXXXXXXXXXXXXX\lib\site-packages\pyarrow\parquet.py", line 1281 in read_table   File "D:\XXXXXXXXXXXXXXXXXXXXXXXXX\lib\site-packages\pycelonis\objects_ibc.py", line 888 in upload_file   File "D:\XXXXXXXXXXXXXXXXXXXXXXXXX\lib\site-packages\pycelonis\objects_ibc.py", line 938 in upload_df_chunk   File "D:\XXXXXXXXXXXXXXXXXXXXXXXXX\lib\site-packages\joblib\parallel.py", line 253 in   File "D:\XXXXXXXXXXXXXXXXXXXXXXXXX\lib\site-packages\joblib\parallel.py", line 253 in call   File "D:\XXXXXXXXXXXXXXXXXXXXXXXXX\lib\site-packages\joblib_parallel_backends.py", line 572 in init   File "D:\XXXXXXXXXXXXXXXXXXXXXXXXX\lib\site-packages\joblib_parallel_backends.py", line 208 in apply_async   File "D:\XXXXXXXXXXXXXXXXXXXXXXXXX\lib\site-packages\joblib\parallel.py", line 765 in _dispatch   File "D:\XXXXXXXXXXXXXXXXXXXXXXXXX\lib\site-packages\joblib\parallel.py", line 847 in dispatch_one_batch   File "D:\XXXXXXXXXXXXXXXXXXXXXXXXX\lib\site-packages\joblib\parallel.py", line 1029 in call   File "D:\XXXXXXXXXXXXXXXXXXXXXXXXX\lib\site-packages\pycelonis\utils\api_utils.py", line 32 in threaded   File "D:\XXXXXXXXXXXXXXXXXXXXXXXXX\lib\site-packages\pycelonis\objects_ibc.py", line 944 in   File "D:\XXXXXXXXXXXXXXXXXXXXXXXXX\lib\site-packages\pycelonis\objects_ibc.py", line 944 in push_table   File "pythonscript.py", line 152 in main   File "pythonscript.py", line 171 in Fatal Python error: Illegal instruction

Current thread 0x00001950 (most recent call first):   File "D:\XXXXXXXXXXXXXXXXXXXXXXXXX\lib\site-packages\pyarrow\parquet.py", line 253 in read   File "D:\XXXXXXXXXXXXXXXXXXXXXXXXX\lib\site-packages\pyarrow\parquet.py", line 605 in read   File "D:\XXXXXXXXXXXXXXXXXXXXXXXXX\lib\site-packages\pyarrow\parquet.py", line 1137 in read   File "D:\XXXXXXXXXXXXXXXXXXXXXXXXX\lib\site-packages\pyarrow\parquet.py", line 1281 in read_table   File "D:\XXXXXXXXXXXXXXXXXXXXXXXXX\lib\site-packages\pycelonis\objects_ibc.py", line 888 in upload_file   File "D:\XXXXXXXXXXXXXXXXXXXXXXXXX\lib\site-packages\pycelonis\objects_ibc.py", line 938 in upload_df_chunk   File "D:\XXXXXXXXXXXXXXXXXXXXXXXXX\lib\site-packages\joblib\parallel.py", line 253 in   File "D:\XXXXXXXXXXXXXXXXXXXXXXXXX\lib\site-packages\joblib\parallel.py", line 253 in call   File "D:\XXXXXXXXXXXXXXXXXXXXXXXXX\lib\site-packages\joblib_parallel_backends.py", line 572 in init   File "D:\XXXXXXXXXXXXXXXXXXXXXXXXX\lib\site-packages\joblib_parallel_backends.py", line 208 in apply_async   File "D:\XXXXXXXXXXXXXXXXXXXXXXXXX\lib\site-packages\joblib\parallel.py", line 765 in _dispatch   File "D:\XXXXXXXXXXXXXXXXXXXXXXXXX\lib\site-packages\joblib\parallel.py", line 847 in dispatch_one_batch   File "D:\XXXXXXXXXXXXXXXXXXXXXXXXX\lib\site-packages\joblib\parallel.py", line 1029 in call   File "D:\XXXXXXXXXXXXXXXXXXXXXXXXX\lib\site-packages\pycelonis\utils\api_utils.py", line 32 in threaded   File "D:\XXXXXXXXXXXXXXXXXXXXXXXXX\lib\site-packages\pycelonis\objects_ibc.py", line 944 in   File "D:\XXXXXXXXXXXXXXXXXXXXXXXXX\lib\site-packages\pycelonis\objects_ibc.py", line 944 in push_table   File "pythonscript.py", line 152 in main   File "pythonscript.py", line 171 in

Environment: OS Name Microsoft Windows Server 2016 Standard Version 10.0.14393 Build 14393 Other OS Description Not Available OS Manufacturer Microsoft Corporation System Name XXXXXXXXXXXXXXXXXXXXXXXXXXX System Manufacturer VMware, Inc. System Model VMware7,1 System Type x64-based PC System SKU Unsupported Processor Intel(R) Xeon(R) CPU E5-2699 v3 @ 2.30GHz, 2295 Mhz, 1 Core(s), 1 Logical Processor(s) Processor Intel(R) Xeon(R) CPU E5-2699 v3 @ 2.30GHz, 2295 Mhz, 1 Core(s), 1 Logical Processor(s) Processor Intel(R) Xeon(R) CPU E5-2699 v3 @ 2.30GHz, 2295 Mhz, 1 Core(s), 1 Logical Processor(s) Processor Intel(R) Xeon(R) CPU E5-2699 v3 @ 2.30GHz, 2295 Mhz, 1 Core(s), 1 Logical Processor(s) BIOS Version/Date VMware, Inc. VMW71.00V.0.B64.1704120155, 4/12/2017 SMBIOS Version 2.7 BIOS Mode UEFI BaseBoard Manufacturer Intel Corporation BaseBoard Model Not Available BaseBoard Name Base Board Platform Role Desktop Secure Boot State Unsupported PCR7 Configuration Not Available Windows Directory C:\Windows System Directory C:\Windows\system32 Boot Device \Device\HarddiskVolume2 Locale United States Hardware Abstraction Layer Version = "10.0.14393.3297" User Name Not Available Time Zone Eastern Daylight Time Installed Physical Memory (RAM) 32.0 GB Total Physical Memory 32.0 GB Available Physical Memory 29.7 GB Total Virtual Memory 40.0 GB Available Virtual Memory 37.9 GB Page File Space 8.00 GB Page File C:\pagefile.sys Device Guard Virtualization based security Not enabled A hypervisor has been detected. Features required for Hyper-V will not be displayed. Reporter: Kristopher Jong

Note: This issue was originally created as ARROW-9349. Please see the migration documentation for further details.

asfimport commented 4 years ago

Wes McKinney / @wesm: Can you let us know what the processor ID is? This is almost certainly ARROW-7939

asfimport commented 4 years ago

Wes McKinney / @wesm: Sorry I missed that it's E5-2699. This processor has AVX2 so I don't think it's ARROW-7939. If you can find out any information to determine what is the illegal instruction that is crashing the application that would help us figure out what's wrong

asfimport commented 4 years ago

Wes McKinney / @wesm: Could you try using a nightly build pip install --extra-index-url https://pypi.fury.io/arrow-nightlies/ --pre pyarrow. That would help us rule out the BMI2 issue

asfimport commented 4 years ago

Kristopher Jong: I installed pyarrow-0.18.0.dev551 build and it has the same behavior.

asfimport commented 4 years ago

Kristopher Jong: I can't figure out what the illegal instruction is, it appears to be the point when it switches into the DLL compiled instructions which is why my python script isn't picking up the exception stack.  The original behavior was no errors at all, it just causes the python script to fail with no exceptions.  Once I turned on the faulthandler, I was able to find it was a windows fatal error that was happening causing the failure.

asfimport commented 4 years ago

Wes McKinney / @wesm: [~mparry] can you offer any guidance to [~kmj1104213] about how to determine what illegal instruction is causing the problem like you did in ARROW-9114?

asfimport commented 4 years ago

Morgan Parry: We just attached the Visual Studio debugger to the Python process. It trapped the illegal instruction and it was then apparent from the disassembly what the offender was.

I see from the env reported that this is running under VMWare. Note that this can mask CPU features, depending on the software version, configuration, specs of other machines in the cluster (i.e. it may mask to the lowest common denominator), etc. This is exactly what was happening in our case, which took a while to figure out - i.e. the physical CPU had AVX2 but the virtual one didn't.

asfimport commented 4 years ago

Charles Surett: Can someone make a debug build? I've been attempting to get a build with debug symbols but haven't had any luck.

asfimport commented 2 years ago

Joris Van den Bossche / @jorisvandenbossche: [~kmj1104213] do you know if you still run into this issue with the latest pyarrow release?