cknd / stackprinter

Debugging-friendly exceptions for Python
MIT License
1.28k stars 37 forks source link

(Address boundary error) #48

Closed tsoernes closed 3 years ago

tsoernes commented 3 years ago

The following code

import faulthandler
import pandas as pd
from pathlib import Path
import stackprinter
faulthandler.enable()
stackprinter.set_excepthook(style='darkbg2')
pd.read_feather(Path.home() / 'Downloads/job_desc_address_boundary_error.feather')

yields

Fatal Python error: Segmentation fault

Current thread 0x00007f21755d6740 (most recent call first):
  File "/home/torstein/anaconda3/lib/python3.8/site-packages/stackprinter/extraction.py", line 167 in lookup
  File "/home/torstein/anaconda3/lib/python3.8/site-packages/stackprinter/extraction.py", line 146 in get_vars
  File "/home/torstein/anaconda3/lib/python3.8/site-packages/stackprinter/extraction.py", line 104 in get_info
  File "/home/torstein/anaconda3/lib/python3.8/site-packages/stackprinter/formatting.py", line 164 in <listcomp>
  File "/home/torstein/anaconda3/lib/python3.8/site-packages/stackprinter/formatting.py", line 164 in format_exc_info
  File "/home/torstein/anaconda3/lib/python3.8/site-packages/stackprinter/__init__.py", line 146 in format
  File "/home/torstein/anaconda3/lib/python3.8/site-packages/stackprinter/__init__.py", line 23 in show_or_format
  File "/home/torstein/anaconda3/lib/python3.8/site-packages/stackprinter/__init__.py", line 171 in show
  File "/home/torstein/anaconda3/lib/python3.8/site-packages/stackprinter/__init__.py", line 23 in show_or_format
  File "/home/torstein/anaconda3/lib/python3.8/site-packages/stackprinter/__init__.py", line 257 in hook
fish: Job 2, 'python rf.py' terminated by signal SIGSEGV (Address boundary error)

while the following code:

import faulthandler
import pandas as pd
from pathlib import Path
faulthandler.enable()
pd.read_feather(Path.home() / 'Downloads/job_desc_address_boundary_error.feather')

yields

Traceback (most recent call last):
  File "rf.py", line 6, in <module>
    pd.read_feather(Path.home() / 'Downloads/job_desc_address_boundary_error.feather')
  File "/home/torstein/anaconda3/lib/python3.8/site-packages/pandas/io/feather_format.py", line 127, in read_feather
    return feather.read_feather(
  File "/home/torstein/anaconda3/lib/python3.8/site-packages/pyarrow/feather.py", line 216, in read_feather
    return (read_table(source, columns=columns, memory_map=memory_map)
  File "/home/torstein/anaconda3/lib/python3.8/site-packages/pyarrow/feather.py", line 238, in read_table
    reader.open(source, use_memory_map=memory_map)
  File "pyarrow/feather.pxi", line 67, in pyarrow.lib.FeatherReader.open
  File "pyarrow/error.pxi", line 141, in pyarrow.lib.pyarrow_internal_check_status
  File "pyarrow/error.pxi", line 97, in pyarrow.lib.check_status
pyarrow.lib.ArrowInvalid: File is too small to be a well-formed file

I.e. stackprinter causes an address boundary error. stackprinter 0.2.5

cknd commented 3 years ago

thanks! how weird, stackprinter doesn't touch the C Api or thelike perhaps this happens when it tries to access some object attribute - which shouldn't segfault, but stackprinter might be the first thing to try and access it. if you'd like to help, could you sprinkle some logs in this function https://github.com/cknd/stackprinter/blob/32c19bf5f972472324848540dd8d425528c28c09/stackprinter/extraction.py#L155 and try again? some clues about what values it's trying to access when this happens might be interesting

pitrou commented 3 years ago

This is really an issue in PyArrow. stackprinter is inspecting the properties of a Cython-generated extension type, and it crashes because the object is not fully initialized.

Reference: https://issues.apache.org/jira/browse/ARROW-12993

cknd commented 3 years ago

thanks for the followup!