Open bjfar opened 4 months ago
@bjfar thanks for the report! This is a bit bizarre .. So when pandas' to_parquet
gets called for the first time, pandas will call pyarrow.register_extension_type(..)
to register its extension types. This is defined in a python submodule in pandas, so I would expect that normal python execution will ensure this code from importing the submodule is only run once.
But maybe that assumption is not true in all cases, or the pytest fixture meddle with the import mechanism? In any case, if we should protect this from happening, that's something that needs to be done on the pandas side. Would you want to report it there?
Describe the bug, including details regarding any error messages, version, and platform.
Python version: 3.10.14 pyarrow version: 16.1.0 pandas version: 2.2.2 pytest version: 8.2.1
I have some apparently niche circumstances that trigger the following error:
It seems to have something to do with how pytest orchestrates its tests. Here is my minimal example:
test_minimal.py
Running
pytest test_minimal.py
then triggers the error.Notably, the error does not occur if either test is run independently, and it does not occur if the
testdir
fixture is removed or replaced with some other fixture. So I guess it has something to do with whatevertestdir
is doing under the hood. Presumably to do with how pandas/pyarrow get imported.In my real case I would really quite like to keep using the
testdir
fixture, though I can probably find a different way to do things. But nonetheless this behaviour seemed worth reporting. Not sure if it is a pyarrow issue though, or whether it is more of a pytest issue, or maybe even pandas.Component(s)
Parquet, Python