Open scharlottej13 opened 2 years ago
Yeah this isn't terribly surprising. The pd nullable types are not super well supported. I think you can get through your first issue with this change:
diff --git a/dask/dataframe/core.py b/dask/dataframe/core.py
index 3a1e517b..d5a1f910 100644
--- a/dask/dataframe/core.py
+++ b/dask/dataframe/core.py
@@ -2068,7 +2068,7 @@ Dask Name: {name}, {task} tasks"""
name = self._token_prefix + "var-numeric" + tokenize(num, split_every)
cols = num._meta.columns if is_dataframe_like(num) else None
- var_shape = num._meta_nonempty.values.var(axis=0).shape
+ var_shape = num._meta_nonempty.var(axis=0).shape
array_var_name = (array_var._name,) + (0,) * len(var_shape)
layer = {(name, 0): (methods.wrap_var_reduction, array_var_name, cols)}
but I haven't been able to get to the bottom of the next error yet. Are you planning on looking into this?
Thanks @jsignell! Yup, I can keep looking into this.
What happened: Error with
dask.dataframe.describe()
when columns contain nullable data types.What you expected to happen: Similar output as
pandas.dataframe.describe()
(which works)Minimal Complete Verifiable Example:
full traceback
```python traceback --------------------------------------------------------------------------- AttributeError Traceback (most recent call last) AttributeError: 'NAType' object has no attribute 'conjugate' The above exception was the direct cause of the following exception: TypeError Traceback (most recent call last) /var/folders/hf/2s7qjx7j5ndc5220_qxv8y800000gn/T/ipykernel_4176/3509056714.py inAnything else we need to know?: Individually calling the methods in
describe
,std
andquantile
throwTypeError: Cannot interpret 'Int64Dtype()' as a data type
full traceback
```python traceback --------------------------------------------------------------------------- TypeError Traceback (most recent call last) /var/folders/hf/2s7qjx7j5ndc5220_qxv8y800000gn/T/ipykernel_4176/418681936.py inEnvironment: