Error in dataframes of categorical variables

mireklzicar commented 1 week ago

I am getting this error when running Exploring Fashion MNIST and Tooltip for the Fashion MNIST Embedding notebook. Seems its caused by the "class" column in the dataframe in combination with sum() function while checking nans. Numpy or pandas version is not specifies so I cannot tell if I have the right or wrong one (pandas==2.1.4, numpy==1.26.3, jupyter-scatter==0.16.0).

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
/var/folders/gj/2n3kf82s035cdqv1q1m6mshw0000gp/T/ipykernel_4728/3526999429.py in <module>
      1 from jscatter import Scatter
      2 
----> 3 scatter = Scatter(
      4     data=df,
      5     x='tsneX',

~/opt/anaconda3/lib/python3.9/site-packages/jscatter/jscatter.py in __init__(self, x, y, data, **kwargs)
    315         self._options = {}
    316 
--> 317         self.x(x, kwargs.get('x_scale', UNDEF))
    318         self.y(y, kwargs.get('y_scale', UNDEF))
    319         self.width(kwargs.get('width', UNDEF))

~/opt/anaconda3/lib/python3.9/site-packages/jscatter/jscatter.py in x(self, x, scale, **kwargs)
    606         if x is not UNDEF or scale is not UNDEF:
    607             self.update_widget('prevent_filter_reset', True)
--> 608             self._points[:, 0] = zerofy_missing_values(self.x_data.values, 'X')
    609 
    610             self._x_min = np.min(self._points[:, 0])

~/opt/anaconda3/lib/python3.9/site-packages/jscatter/utils.py in zerofy_missing_values(values, dtype)
    147 
    148 def zerofy_missing_values(values, dtype):
--> 149     if isnan(sum(values)):
    150         warnings.warn(
    151             f'{dtype} data contains missing values. Those missing values will be replaced with zeros.',

~/opt/anaconda3/lib/python3.9/site-packages/numpy/core/fromnumeric.py in sum(a, axis, dtype, out, keepdims, initial, where)
   2311         return res
   2312 
-> 2313     return _wrapreduction(a, np.add, 'sum', axis, dtype, out, keepdims=keepdims,
   2314                           initial=initial, where=where)
   2315 

~/opt/anaconda3/lib/python3.9/site-packages/numpy/core/fromnumeric.py in _wrapreduction(obj, ufunc, method, axis, dtype, out, **kwargs)
     86                 return reduction(axis=axis, out=out, **passkwargs)
     87 
---> 88     return ufunc.reduce(obj, axis, dtype, out, **passkwargs)
     89 
     90 

~/opt/anaconda3/lib/python3.9/site-packages/pandas/core/arrays/categorical.py in __array_ufunc__(self, ufunc, method, *inputs, **kwargs)
   1692         # for all other cases, raise for now (similarly as what happens in
   1693         # Series.__array_prepare__)
-> 1694         raise TypeError(
   1695             f"Object with dtype {self.dtype} cannot perform "
   1696             f"the numpy op {ufunc.__name__}"

TypeError: Object with dtype category cannot perform the numpy op add

flekschas commented 1 week ago

Oh odd. Did you by chance alter the notebook in anyway or did this just happen when you execute the notebook the first time? The type error makes me think that whatever the X data is has a dtype of category, which it shouldn't have.

flekschas commented 1 week ago

I've tested this again and it works as expected for me. My pandas and numpy versions are:

pandas==1.5.3
numpy==1.25.0

Since Pandas has a new major version, I updated both libraries to the following versions and tested again. Still everything works as expected.

pandas==2.2.2
numpy==1.26.4

Screenshot 2024-06-20 at 8 53 00 AM

I also tested the notebook with v2 of numpy. This requires updating matplotlib and pyarrow as well. Again everything works as expected.

matplotlib==3.9.0
numpy==2.0.0
pandas==2.2.2
pyarrow==16.1.0

Screenshot 2024-06-20 at 8 59 00 AM

Can you try again after updating the above mentioned four libraries?

flekschas / jupyter-scatter-tutorial

Error in dataframes of categorical variables #3