CAVEconnectome / CAVEclient

This is the python client for accessing REST APIs within the Connectome Annotation Versioning Engine.
https://caveconnectome.github.io/CAVEclient/
MIT License
19 stars 12 forks source link

Some `live_live_query` calls fail, caused by some pandas v1 vs pandas v2 garbage #110

Closed jasper-tms closed 10 months ago

jasper-tms commented 11 months ago

In an environment with pandas v2.0.3 installed:

from datetime import datetime
from caveclient import CAVEclient
client = CAVEclient('fanc_production_mar2021')

# These ones work
client.materialize.live_live_query('cell_ids', timestamp=datetime.utcnow())
client.materialize.live_query('cell_ids', timestamp=datetime.utcnow(), filter_in_dict={'pt_root_id': [648518346486614449]})

# This one errors out
client.materialize.live_live_query('cell_ids', timestamp=datetime.utcnow(), filter_in_dict={'cell_ids': {'pt_root_id': [648518346486614449]}})

with:

----> 1 client.materialize.live_live_query('cell_ids', timestamp=datetime.utcnow(), filter_in_dict={'cell_ids': {'pt_root_id': [648518346486614449]}})

File ~/.virtualenvs/test/lib/python3.9/site-packages/caveclient/materializationengine.py:1864, in MaterializatonClientV3.live_live_query(self, table, timestamp, joins, filter_in_dict, filter_out_dict, filter_equal_dict, filter_spatial_dict, select_columns, offset, limit, datastack_name, split_positions, metadata, suffixes, desired_resolution, allow_missing_lookups, allow_invalid_root_ids)
   1862 warnings.simplefilter(action="ignore", category=FutureWarning)
   1863 warnings.simplefilter(action="ignore", category=DeprecationWarning)
-> 1864 df = pa.deserialize(response.content)
   1865 df = df.copy()
   1866 if desired_resolution is not None:

File ~/.virtualenvs/test/lib/python3.9/site-packages/pyarrow/serialization.pxi:550, in pyarrow.lib.deserialize()

File ~/.virtualenvs/test/lib/python3.9/site-packages/pyarrow/serialization.pxi:556, in pyarrow.lib._deserialize()

File ~/.virtualenvs/test/lib/python3.9/site-packages/pyarrow/serialization.pxi:285, in pyarrow.lib.SerializedPyObject.deserialize()

File ~/.virtualenvs/test/lib/python3.9/site-packages/pyarrow/serialization.pxi:192, in pyarrow.lib.SerializationContext._deserialize_callback()

File ~/.virtualenvs/test/lib/python3.9/site-packages/pyarrow/serialization.py:183, in _load_pickle_from_buffer(data)
    181 def _load_pickle_from_buffer(data):
    182     as_memoryview = memoryview(data)
--> 183     return builtin_pickle.loads(as_memoryview)

ModuleNotFoundError: No module named 'pandas.core.indexes.numeric'

Some googling suggests that this error is due to changes from pandas 1.x.x to pandas 2.x.x. That checks out, because if I downgrade to pandas v1.5.3, this live_live_query succeeds.

I can't tell if this is something that pyarrow is doing wrong or CAVEclient is doing wrong, but you might consider marking pandas<2.0.0 in your requirements, or investigating if you think this is a bug you could fix.

ceesem commented 11 months ago

Given that it's showing up while loading a pickle, I'm not 100% sure that this is an issue with pandas 1 vs 2, but rather the difference between the serverside version and the client version. That said, we need to do an overhaul of pyarrow serialization anyway and this is a good reminder.

ceesem commented 11 months ago

We've made considerable progress towards upgrading our serialization approach to no longer have this issue. The server-side version is going to be pushed out in the next day or two and we're going to test the client changes a bit more before calling it good, since we need to have the server updates deployed first.

ceesem commented 10 months ago

This issue should now be fixed with #114 / release 5.10.1.