CAVEconnectome / CAVEclient

This is the python client for accessing REST APIs within the Connectome Annotation Versioning Engine.
https://caveconnectome.github.io/CAVEclient/
MIT License
19 stars 12 forks source link

Pandas 2.2 breaks client.materialize.query_table #155

Closed ilexaquifolium closed 3 months ago

ilexaquifolium commented 3 months ago

with pandas 2.1.4

>>> import caveclient as cv
Intel MKL WARNING: Support of Intel(R) Streaming SIMD Extensions 4.2 (Intel(R) SSE4.2) enabled only processors has been deprecated. Intel oneAPI Math Kernel Library 2025.0 will require Intel(R) Advanced Vector Extensions (Intel(R) AVX) instructions.
Intel MKL WARNING: Support of Intel(R) Streaming SIMD Extensions 4.2 (Intel(R) SSE4.2) enabled only processors has been deprecated. Intel oneAPI Math Kernel Library 2025.0 will require Intel(R) Advanced Vector Extensions (Intel(R) AVX) instructions.
>>> client = cv.CAVEclient('fanc_production_mar2021')
>>> client.materialize.query_table("synapse_regions_v1", limit=5)
201 - "Limited query to 5 rows
     id_ref                      created_ref valid_ref  score  pre_pt_supervoxel_id      pre_pt_root_id  post_pt_supervoxel_id  ...  id                          created valid target_id  neuropil        pre_pt_position       post_pt_position
0  45055925 2022-12-09 21:59:43.908990+00:00         t     34     73187411961916060  648518346499676883      73187411961918718  ...   0 2023-10-03 14:57:04.486532+00:00     t  45055925    LTct_L  [32787, 116961, 1970]  [32769, 116987, 1969]
1         1 2022-11-17 14:28:59.843347+00:00         t     62     73187411961904090  648518346514439367      73187411961913830  ...   1 2023-10-03 14:57:04.486532+00:00     t         1  IntTct_L  [32793, 117195, 1964]  [32769, 117165, 1964]
2         2 2022-11-17 14:28:59.843347+00:00         t     29     73117043217791261  648518346506155934      73187411962044945  ...   2 2023-10-03 14:57:04.486532+00:00     t         2    LTct_L  [32763, 116821, 2015]  [32771, 116853, 2016]
3         3 2022-11-17 14:28:59.843347+00:00         t     47     73117043217599399  648518346524002309      73187411961848850  ...   3 2023-10-03 14:57:04.486532+00:00     t         3  IntTct_L  [32761, 117117, 1941]  [32773, 117089, 1941]
4         4 2022-11-17 14:28:59.843347+00:00         t     35     73187411961892835  648518346514439367      73187411961890557  ...   4 2023-10-03 14:57:04.486532+00:00     t         4  IntTct_L  [32793, 117201, 1959]  [32771, 117191, 1958]

[5 rows x 15 columns]
>>> 

with pandas 2.2.1 the same command produces the error:

>>> import caveclient as cv
Intel MKL WARNING: Support of Intel(R) Streaming SIMD Extensions 4.2 (Intel(R) SSE4.2) enabled only processors has been deprecated. Intel oneAPI Math Kernel Library 2025.0 will require Intel(R) Advanced Vector Extensions (Intel(R) AVX) instructions.
Intel MKL WARNING: Support of Intel(R) Streaming SIMD Extensions 4.2 (Intel(R) SSE4.2) enabled only processors has been deprecated. Intel oneAPI Math Kernel Library 2025.0 will require Intel(R) Advanced Vector Extensions (Intel(R) AVX) instructions.
/usr/local/anaconda3/lib/python3.11/site-packages/pandas/core/arrays/masked.py:60: UserWarning: Pandas requires version '1.3.6' or newer of 'bottleneck' (version '1.3.5' currently installed).
  from pandas.core import (
>>> client = cv.CAVEclient('fanc_production_mar2021')
>>> client.materialize.query_table("synapse_regions_v1", limit=5)
201 - "Limited query to 5 rows
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/local/anaconda3/lib/python3.11/site-packages/pandas/core/frame.py", line 1203, in __repr__
    return self.to_string(**repr_params)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/anaconda3/lib/python3.11/site-packages/pandas/util/_decorators.py", line 333, in wrapper
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/anaconda3/lib/python3.11/site-packages/pandas/core/frame.py", line 1383, in to_string
    return fmt.DataFrameRenderer(formatter).to_string(
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/anaconda3/lib/python3.11/site-packages/pandas/io/formats/format.py", line 962, in to_string
    string = string_formatter.to_string()
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/anaconda3/lib/python3.11/site-packages/pandas/io/formats/string.py", line 29, in to_string
    text = self._get_string_representation()
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/anaconda3/lib/python3.11/site-packages/pandas/io/formats/string.py", line 53, in _get_string_representation
    return self._fit_strcols_to_terminal_width(strcols)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/anaconda3/lib/python3.11/site-packages/pandas/io/formats/string.py", line 184, in _fit_strcols_to_terminal_width
    self.fmt.truncate()
  File "/usr/local/anaconda3/lib/python3.11/site-packages/pandas/io/formats/format.py", line 655, in truncate
    self._truncate_horizontally()
  File "/usr/local/anaconda3/lib/python3.11/site-packages/pandas/io/formats/format.py", line 673, in _truncate_horizontally
    self.tr_frame = concat((left, right), axis=1)
                    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/anaconda3/lib/python3.11/site-packages/pandas/core/reshape/concat.py", line 395, in concat
    return op.get_result()
           ^^^^^^^^^^^^^^^
  File "/usr/local/anaconda3/lib/python3.11/site-packages/pandas/core/reshape/concat.py", line 691, in get_result
    return out.__finalize__(self, method="concat")
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/anaconda3/lib/python3.11/site-packages/pandas/core/generic.py", line 6270, in __finalize__
    have_same_attrs = all(obj.attrs == attrs for obj in other.objs[1:])
                      ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/anaconda3/lib/python3.11/site-packages/pandas/core/generic.py", line 6270, in <genexpr>
    have_same_attrs = all(obj.attrs == attrs for obj in other.objs[1:])
                          ^^^^^^^^^^^^^^^^^^
ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()
>>> 
bdpedigo commented 3 months ago

the following works for me with CAVEclient 5.15.2 and pandas 2.2.1:

import pandas as pd

import caveclient as cv

print(pd.__version__)
print(cv.__version__)

client = cv.CAVEclient("minnie65_public_v117")
client.materialize.query_table("synapses_pni_2", limit=5)

so i wonder if this is somehow specific to your table (but I don't have access).

also, what happens when you assign the output of query_table to a variable? It looks like the error is coming from the repr that is implicit when you don't assign the output

ceesem commented 3 months ago

Could you provide some more information about your environment?

import caveclient as cv
client = cv.CAVEclient('fanc_production_mar2021')
client.materialize.query_table("synapse_regions_v1", limit=5)

works as expected on caveclient 5.15.2 (current pypi release) and pandas 2.2.1 on OS X.

ilexaquifolium commented 3 months ago

I'm using macOS Ventura 13.4.1, with python 3.11.5 and caveclient 5.15.2

>>> print(pd.__version__)
2.2.1
>>> print(cv.__version__)
5.15.2
>>> client = cv.CAVEclient("minnie65_public_v117")
>>> table = client.materialize.query_table("synapses_pni_2", limit=5)
201 - "Limited query to 5 rows
>>> table.head()
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/local/anaconda3/lib/python3.11/site-packages/pandas/core/frame.py", line 1203, in __repr__
    return self.to_string(**repr_params)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/anaconda3/lib/python3.11/site-packages/pandas/util/_decorators.py", line 333, in wrapper
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/anaconda3/lib/python3.11/site-packages/pandas/core/frame.py", line 1383, in to_string
    return fmt.DataFrameRenderer(formatter).to_string(
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/anaconda3/lib/python3.11/site-packages/pandas/io/formats/format.py", line 962, in to_string
    string = string_formatter.to_string()
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/anaconda3/lib/python3.11/site-packages/pandas/io/formats/string.py", line 29, in to_string
    text = self._get_string_representation()
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/anaconda3/lib/python3.11/site-packages/pandas/io/formats/string.py", line 53, in _get_string_representation
    return self._fit_strcols_to_terminal_width(strcols)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/anaconda3/lib/python3.11/site-packages/pandas/io/formats/string.py", line 184, in _fit_strcols_to_terminal_width
    self.fmt.truncate()
  File "/usr/local/anaconda3/lib/python3.11/site-packages/pandas/io/formats/format.py", line 655, in truncate
    self._truncate_horizontally()
  File "/usr/local/anaconda3/lib/python3.11/site-packages/pandas/io/formats/format.py", line 673, in _truncate_horizontally
    self.tr_frame = concat((left, right), axis=1)
                    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/anaconda3/lib/python3.11/site-packages/pandas/core/reshape/concat.py", line 395, in concat
    return op.get_result()
           ^^^^^^^^^^^^^^^
  File "/usr/local/anaconda3/lib/python3.11/site-packages/pandas/core/reshape/concat.py", line 691, in get_result
    return out.__finalize__(self, method="concat")
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/anaconda3/lib/python3.11/site-packages/pandas/core/generic.py", line 6270, in __finalize__
    have_same_attrs = all(obj.attrs == attrs for obj in other.objs[1:])
                      ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/anaconda3/lib/python3.11/site-packages/pandas/core/generic.py", line 6270, in <genexpr>
    have_same_attrs = all(obj.attrs == attrs for obj in other.objs[1:])
                          ^^^^^^^^^^^^^^^^^^
ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()
bdpedigo commented 3 months ago

so it sounds like the data was downloaded successfully? there is just some issue with pandas trying to display it when you call repr implicitly?

is this from a jupyter notebook or ipython shell? does print(df.head()) work?

bdpedigo commented 3 months ago

this is the closest issue in pandas i could find https://github.com/pandas-dev/pandas/issues/47103

ceesem commented 3 months ago

Agreed, query_table does not have an error, it's the repr function having an issue displaying metadata with lists (which should be allowed). I can replicate the issue now in a freshly installed ipython but not in jupyter, so it must be happening in __repr__ but not __repr_html__. This looks to me that it's a pandas bug.

In terms of workarounds, I see two possibilities: 1) Use Jupyter notebooks, which don't seem to have this problem. 2) Omit metadata (i.e. don't populate the .attrs object) by adding an argument to the query_table call:

import caveclient as cv
client = cv.CAVEclient('fanc_production_mar2021')
df = client.materialize.query_table("synapse_regions_v1", limit=5, metadata=False)

Displaying the resulting dataframe in ipython does not throw the same error for me, and most likely, you aren't using it anyway.

ilexaquifolium commented 3 months ago

setting metadata=False fixes it thankyou!

fcollman commented 3 months ago

can we submit a pandas bug and close this one?

fcollman commented 3 months ago

@ilexaquifolium what do you think?

ilexaquifolium commented 3 months ago

I'm happy with that, yes