Open Shine226 opened 3 years ago
@Shine226 Can you try and repoduce the error without using the all_vars index? Just use all_vars.
First error:
Traceback (most recent call last):
File "/opt/anaconda3/lib/python3.7/site-packages/flask/app.py", line 2463, in __call__
return self.wsgi_app(environ, start_response)
File "/opt/anaconda3/lib/python3.7/site-packages/flask/app.py", line 2449, in wsgi_app
response = self.handle_exception(e)
File "/opt/anaconda3/lib/python3.7/site-packages/flask/app.py", line 1866, in handle_exception
reraise(exc_type, exc_value, tb)
File "/opt/anaconda3/lib/python3.7/site-packages/flask/_compat.py", line 39, in reraise
raise value
File "/opt/anaconda3/lib/python3.7/site-packages/flask/app.py", line 2446, in wsgi_app
response = self.full_dispatch_request()
File "/opt/anaconda3/lib/python3.7/site-packages/flask/app.py", line 1951, in full_dispatch_request
rv = self.handle_user_exception(e)
File "/opt/anaconda3/lib/python3.7/site-packages/flask/app.py", line 1820, in handle_user_exception
reraise(exc_type, exc_value, tb)
File "/opt/anaconda3/lib/python3.7/site-packages/flask/_compat.py", line 39, in reraise
raise value
File "/opt/anaconda3/lib/python3.7/site-packages/flask/app.py", line 1949, in full_dispatch_request
rv = self.dispatch_request()
File "/opt/anaconda3/lib/python3.7/site-packages/flask/app.py", line 1935, in dispatch_request
return self.view_functions[rule.endpoint](**req.view_args)
File "/Users/chenguangxu/Documents/GitHub/detect_simpsons_paradox_dev/wiggum_app/controller.py", line 273, in main
labeled_df_setup.get_subgroup_trends_1lev(trend_list)
File "/Users/chenguangxu/Documents/GitHub/detect_simpsons_paradox_dev/wiggum/detectors.py", line 259, in get_subgroup_trends_1lev
cur_trend.get_trend_vars(self)
File "/Users/chenguangxu/Documents/GitHub/detect_simpsons_paradox_dev/wiggum/trend_components/base_getvars.py", line 181, in get_trend_vars
['ordinal','continuous'],['ordinal','continuous'])
File "/Users/chenguangxu/Documents/GitHub/detect_simpsons_paradox_dev/wiggum/trend_components/base_getvars.py", line 102, in set_weights_regression
indep_vars = labeled_df.get_vars_per_roletype('independent', i_type)
File "/Users/chenguangxu/Documents/GitHub/detect_simpsons_paradox_dev/wiggum/labeled_dataframe.py", line 528, in get_vars_per_roletype
drop_ignore)]
File "/Users/chenguangxu/Documents/GitHub/detect_simpsons_paradox_dev/wiggum/labeled_dataframe.py", line 527, in <listcomp>
target_rows = [r & t & d for r,t,d in zip(is_target_role,is_target_type,
TypeError: unsupported operand type(s) for &: 'str' and 'str'
that error says that the wrong type got passed into the variables in that zip. those should all be boolean so that the &
works.
Before changing to modin:
variable
sepal length True
sepal width True
petal length False
petal width False
class False
dtype: bool
variable
sepal length True
sepal width True
petal length True
petal width True
class False
dtype: bool
variable
sepal length True
sepal width True
petal length True
petal width True
class True
dtype: bool
<class 'pandas.core.series.Series'>
<class 'pandas.core.series.Series'>
<class 'pandas.core.series.Series'>
After applying Modin:
__reduced__
variable
sepal length True
sepal width False
petal length True
petal width True
class False
__reduced__
variable
sepal length True
sepal width True
petal length True
petal width True
class False
__reduced__
variable
sepal length True
sepal width True
petal length True
petal width True
class True
<class 'modin.pandas.dataframe.DataFrame'>
<class 'modin.pandas.dataframe.DataFrame'>
<class 'modin.pandas.dataframe.DataFrame'>
We can use .squeeze() to convert back to series.
New error:
UserWarning: `Series.align` defaulting to pandas implementation.
UserWarning: Distributing <class 'pandas.core.series.Series'> object. This may take some time.
UserWarning: Distributing <class 'list'> object. This may take some time.
127.0.0.1 - - [03/Jun/2021 15:47:44] "POST / HTTP/1.1" 500 -
Traceback (most recent call last):
File "/opt/anaconda3/lib/python3.7/site-packages/flask/app.py", line 2463, in __call__
return self.wsgi_app(environ, start_response)
File "/opt/anaconda3/lib/python3.7/site-packages/flask/app.py", line 2449, in wsgi_app
response = self.handle_exception(e)
File "/opt/anaconda3/lib/python3.7/site-packages/flask/app.py", line 1866, in handle_exception
reraise(exc_type, exc_value, tb)
File "/opt/anaconda3/lib/python3.7/site-packages/flask/_compat.py", line 39, in reraise
raise value
File "/opt/anaconda3/lib/python3.7/site-packages/flask/app.py", line 2446, in wsgi_app
response = self.full_dispatch_request()
File "/opt/anaconda3/lib/python3.7/site-packages/flask/app.py", line 1951, in full_dispatch_request
rv = self.handle_user_exception(e)
File "/opt/anaconda3/lib/python3.7/site-packages/flask/app.py", line 1820, in handle_user_exception
reraise(exc_type, exc_value, tb)
File "/opt/anaconda3/lib/python3.7/site-packages/flask/_compat.py", line 39, in reraise
raise value
File "/opt/anaconda3/lib/python3.7/site-packages/flask/app.py", line 1949, in full_dispatch_request
rv = self.dispatch_request()
File "/opt/anaconda3/lib/python3.7/site-packages/flask/app.py", line 1935, in dispatch_request
return self.view_functions[rule.endpoint](**req.view_args)
File "/Users/chenguangxu/Documents/GitHub/detect_simpsons_paradox_dev/wiggum_app/controller.py", line 274, in main
labeled_df_setup.get_subgroup_trends_1lev(trend_list)
File "/Users/chenguangxu/Documents/GitHub/detect_simpsons_paradox_dev/wiggum/detectors.py", line 274, in get_subgroup_trends_1lev
agg_trends = cur_trend.get_trends(self.df,'agg_trend')
File "/Users/chenguangxu/Documents/GitHub/detect_simpsons_paradox_dev/wiggum/trend_components/statistical.py", line 171, in get_trends
groupby_name = groupby_name_by_type[type(data_df)](data_df)
KeyError: <class 'modin.pandas.dataframe.DataFrame'>
The groupby_name_by_type is using pandas.core.frame.Dataframe:
groupby_name_by_type = {pandas.core.groupby.DataFrameGroupBy:lambda df: df.keys,
pandas.core.frame.DataFrame:lambda df: None}
Change code to:
import modin.pandas as pd
groupby_name_by_type = {pd.groupby.DataFrameGroupBy:lambda df: df.keys,
pd.dataframe.DataFrame:lambda df: None}
new error:
Traceback (most recent call last):
File "/opt/anaconda3/lib/python3.7/site-packages/flask/app.py", line 2463, in __call__
return self.wsgi_app(environ, start_response)
File "/opt/anaconda3/lib/python3.7/site-packages/flask/app.py", line 2449, in wsgi_app
response = self.handle_exception(e)
File "/opt/anaconda3/lib/python3.7/site-packages/flask/app.py", line 1866, in handle_exception
reraise(exc_type, exc_value, tb)
File "/opt/anaconda3/lib/python3.7/site-packages/flask/_compat.py", line 39, in reraise
raise value
File "/opt/anaconda3/lib/python3.7/site-packages/flask/app.py", line 2446, in wsgi_app
response = self.full_dispatch_request()
File "/opt/anaconda3/lib/python3.7/site-packages/flask/app.py", line 1951, in full_dispatch_request
rv = self.handle_user_exception(e)
File "/opt/anaconda3/lib/python3.7/site-packages/flask/app.py", line 1820, in handle_user_exception
reraise(exc_type, exc_value, tb)
File "/opt/anaconda3/lib/python3.7/site-packages/flask/_compat.py", line 39, in reraise
raise value
File "/opt/anaconda3/lib/python3.7/site-packages/flask/app.py", line 1949, in full_dispatch_request
rv = self.dispatch_request()
File "/opt/anaconda3/lib/python3.7/site-packages/flask/app.py", line 1935, in dispatch_request
return self.view_functions[rule.endpoint](**req.view_args)
File "/Users/chenguangxu/Documents/GitHub/detect_simpsons_paradox_dev/wiggum_app/controller.py", line 274, in main
labeled_df_setup.get_subgroup_trends_1lev(trend_list)
File "/Users/chenguangxu/Documents/GitHub/detect_simpsons_paradox_dev/wiggum/detectors.py", line 284, in get_subgroup_trends_1lev
curgroup_trend_df = cur_trend.get_trends(cur_grouping,'subgroup_trend')
File "/Users/chenguangxu/Documents/GitHub/detect_simpsons_paradox_dev/wiggum/trend_components/statistical.py", line 173, in get_trends
groupby_name = groupby_name_by_type[type(data_df)](data_df)
File "/Users/chenguangxu/Documents/GitHub/detect_simpsons_paradox_dev/wiggum/trend_components/statistical.py", line 6, in <lambda>
groupby_name_by_type = {pd.groupby.DataFrameGroupBy:lambda df: df.keys,
File "/opt/anaconda3/lib/python3.7/site-packages/modin/pandas/groupby.py", line 125, in __getattr__
raise e
File "/opt/anaconda3/lib/python3.7/site-packages/modin/pandas/groupby.py", line 121, in __getattr__
return object.__getattribute__(self, key)
AttributeError: 'DataFrameGroupBy' object has no attribute 'keys'
Changing keys to _idx_name will solve no attribute 'keys', but not good for extracting a compound key such as ['sex', 'race']:
groupby_name_by_type = {pd.groupby.DataFrameGroupBy:lambda df: df._idx_name,
pd.dataframe.DataFrame:lambda df: None}
New error for corr function:
127.0.0.1 - - [09/Jun/2021 11:29:20] "POST / HTTP/1.1" 500 -
Traceback (most recent call last):
File "/opt/anaconda3/lib/python3.7/site-packages/flask/app.py", line 2463, in __call__
return self.wsgi_app(environ, start_response)
File "/opt/anaconda3/lib/python3.7/site-packages/flask/app.py", line 2449, in wsgi_app
response = self.handle_exception(e)
File "/opt/anaconda3/lib/python3.7/site-packages/flask/app.py", line 1866, in handle_exception
reraise(exc_type, exc_value, tb)
File "/opt/anaconda3/lib/python3.7/site-packages/flask/_compat.py", line 39, in reraise
raise value
File "/opt/anaconda3/lib/python3.7/site-packages/flask/app.py", line 2446, in wsgi_app
response = self.full_dispatch_request()
File "/opt/anaconda3/lib/python3.7/site-packages/flask/app.py", line 1951, in full_dispatch_request
rv = self.handle_user_exception(e)
File "/opt/anaconda3/lib/python3.7/site-packages/flask/app.py", line 1820, in handle_user_exception
reraise(exc_type, exc_value, tb)
File "/opt/anaconda3/lib/python3.7/site-packages/flask/_compat.py", line 39, in reraise
raise value
File "/opt/anaconda3/lib/python3.7/site-packages/flask/app.py", line 1949, in full_dispatch_request
rv = self.dispatch_request()
File "/opt/anaconda3/lib/python3.7/site-packages/flask/app.py", line 1935, in dispatch_request
return self.view_functions[rule.endpoint](**req.view_args)
File "/Users/chenguangxu/Documents/GitHub/detect_simpsons_paradox_dev/wiggum_app/controller.py", line 273, in main
labeled_df_setup.get_subgroup_trends_1lev(trend_list)
File "/Users/chenguangxu/Documents/GitHub/detect_simpsons_paradox_dev/wiggum/detectors.py", line 284, in get_subgroup_trends_1lev
curgroup_trend_df = cur_trend.get_trends(cur_grouping,'subgroup_trend')
File "/Users/chenguangxu/Documents/GitHub/detect_simpsons_paradox_dev/wiggum/trend_components/statistical.py", line 179, in get_trends
corr_data = self.compute_correlation_table(data_df,trend_col_name)
File "/Users/chenguangxu/Documents/GitHub/detect_simpsons_paradox_dev/wiggum/trend_components/statistical.py", line 88, in compute_correlation_table
corr_mat = data_df[corr_var_list].corr(method=self.corrtype)
File "/opt/anaconda3/lib/python3.7/site-packages/modin/pandas/groupby.py", line 622, in corr
return self._default_to_pandas(lambda df: df.corr(**kwargs))
File "/opt/anaconda3/lib/python3.7/site-packages/modin/pandas/groupby.py", line 980, in _default_to_pandas
return self._df._default_to_pandas(groupby_on_multiple_columns, *args, **kwargs)
File "/opt/anaconda3/lib/python3.7/site-packages/modin/pandas/base.py", line 400, in _default_to_pandas
result = op(pandas_obj, *args, **kwargs)
File "/opt/anaconda3/lib/python3.7/site-packages/modin/pandas/groupby.py", line 974, in groupby_on_multiple_columns
by=by, axis=self._axis, squeeze=self._squeeze, **self._kwargs
File "/opt/anaconda3/lib/python3.7/site-packages/pandas/core/frame.py", line 6727, in groupby
dropna=dropna,
File "/opt/anaconda3/lib/python3.7/site-packages/pandas/core/groupby/groupby.py", line 568, in __init__
dropna=self.dropna,
File "/opt/anaconda3/lib/python3.7/site-packages/pandas/core/groupby/grouper.py", line 811, in get_grouper
raise KeyError(gpr)
KeyError: 'class'
Set index for DataFrame before groupby:
# Modin issue: set index before groupby for corr() in get_trends
self.df.index = self.df[groupbyAttr]
#condition the data
cur_grouping = self.df.groupby(groupbyAttr)
Invalid index error:
File "/Users/chenguangxu/Documents/GitHub/detect_simpsons_paradox_dev/wiggum_app/controller.py", line 273, in main
labeled_df_setup.get_subgroup_trends_1lev(trend_list)
File "/Users/chenguangxu/Documents/GitHub/detect_simpsons_paradox_dev/wiggum/detectors.py", line 286, in get_subgroup_trends_1lev
curgroup_trend_df = cur_trend.get_trends(cur_grouping,'subgroup_trend')
File "/Users/chenguangxu/Documents/GitHub/detect_simpsons_paradox_dev/wiggum/trend_components/statistical.py", line 179, in get_trends
corr_data = self.compute_correlation_table(data_df,trend_col_name)
File "/Users/chenguangxu/Documents/GitHub/detect_simpsons_paradox_dev/wiggum/trend_components/statistical.py", line 98, in compute_correlation_table
itertools.product(self.regression_vars,groupby_vars)]
File "/Users/chenguangxu/Documents/GitHub/detect_simpsons_paradox_dev/wiggum/trend_components/statistical.py", line 97, in <listcomp>
corr_data = [(i,d, corr_mat[i][g][d],g) for (i,d),g in
IndexError: invalid index to scalar variable.
Change
corr_data = [(i,d, corr_mat[i][g][d],g) for (i,d),g in
itertools.product(self.regression_vars,groupby_vars)]
to
corr_data = [(i,d, corr_mat.loc[g,i][d],g) for (i,d),g in
itertools.product(self.regression_vars,groupby_vars)]
In Wiggum app, after loading data folder, when clicking the 'visualize' button, new error in add_distance():
Traceback (most recent call last):
File "/opt/anaconda3/lib/python3.7/site-packages/distributed/worker.py", line 2403, in _maybe_deserialize_task
function, args, kwargs = _deserialize(*self.tasks[key])
File "/opt/anaconda3/lib/python3.7/site-packages/distributed/worker.py", line 3238, in _deserialize
args = pickle.loads(args)
File "/opt/anaconda3/lib/python3.7/site-packages/distributed/protocol/pickle.py", line 59, in loads
return pickle.loads(x)
File "/opt/anaconda3/lib/python3.7/site-packages/modin/pandas/dataframe.py", line 2424, in _inflate_light
return cls(query_compiler=query_compiler)
File "/opt/anaconda3/lib/python3.7/site-packages/modin/pandas/dataframe.py", line 90, in __init__
Engine.subscribe(_update_engine)
File "/opt/anaconda3/lib/python3.7/site-packages/modin/config/pubsub.py", line 107, in subscribe
callback(cls)
File "/opt/anaconda3/lib/python3.7/site-packages/modin/pandas/__init__.py", line 122, in _update_engine
initialize_dask()
File "/opt/anaconda3/lib/python3.7/site-packages/modin/engines/dask/utils.py", line 37, in initialize_dask
num_cpus = len(client.ncores())
TypeError: object of type 'coroutine' has no len()
127.0.0.1 - - [09/Jun/2021 15:38:25] "POST / HTTP/1.1" 500 -
Traceback (most recent call last):
File "/opt/anaconda3/lib/python3.7/site-packages/flask/app.py", line 2463, in __call__
return self.wsgi_app(environ, start_response)
File "/opt/anaconda3/lib/python3.7/site-packages/flask/app.py", line 2449, in wsgi_app
response = self.handle_exception(e)
File "/opt/anaconda3/lib/python3.7/site-packages/flask/app.py", line 1866, in handle_exception
reraise(exc_type, exc_value, tb)
File "/opt/anaconda3/lib/python3.7/site-packages/flask/_compat.py", line 39, in reraise
raise value
File "/opt/anaconda3/lib/python3.7/site-packages/flask/app.py", line 2446, in wsgi_app
response = self.full_dispatch_request()
File "/opt/anaconda3/lib/python3.7/site-packages/flask/app.py", line 1951, in full_dispatch_request
rv = self.handle_user_exception(e)
File "/opt/anaconda3/lib/python3.7/site-packages/flask/app.py", line 1820, in handle_user_exception
reraise(exc_type, exc_value, tb)
File "/opt/anaconda3/lib/python3.7/site-packages/flask/_compat.py", line 39, in reraise
raise value
File "/opt/anaconda3/lib/python3.7/site-packages/flask/app.py", line 1949, in full_dispatch_request
rv = self.dispatch_request()
File "/opt/anaconda3/lib/python3.7/site-packages/flask/app.py", line 1935, in dispatch_request
return self.view_functions[rule.endpoint](**req.view_args)
File "/Users/chenguangxu/Documents/GitHub/detect_simpsons_paradox_dev/wiggum_app/controller.py", line 279, in main
labeled_df_setup.add_distance()
File "/Users/chenguangxu/Documents/GitHub/detect_simpsons_paradox_dev/wiggum/ranking_processing.py", line 496, in add_distance
self.result_df['distance'] = self.result_df.apply(dist_helper,axis=1)
File "/opt/anaconda3/lib/python3.7/site-packages/modin/pandas/dataframe.py", line 292, in apply
func, axis=axis, raw=raw, result_type=result_type, args=args, **kwds
File "/opt/anaconda3/lib/python3.7/site-packages/modin/pandas/base.py", line 751, in apply
**kwds,
File "/opt/anaconda3/lib/python3.7/site-packages/modin/backends/pandas/query_compiler.py", line 2348, in apply
return self._callable_func(func, axis, *args, **kwargs)
File "/opt/anaconda3/lib/python3.7/site-packages/modin/backends/pandas/query_compiler.py", line 2446, in _callable_func
axis, lambda df: df.apply(func, axis=axis, *args, **kwargs)
File "/opt/anaconda3/lib/python3.7/site-packages/modin/engines/base/frame/data.py", line 1308, in _apply_full_axis
other=None,
File "/opt/anaconda3/lib/python3.7/site-packages/modin/engines/base/frame/data.py", line 1698, in broadcast_apply_full_axis
for i, new_axis in enumerate([new_index, new_columns])
File "/opt/anaconda3/lib/python3.7/site-packages/modin/engines/base/frame/data.py", line 1698, in <listcomp>
for i, new_axis in enumerate([new_index, new_columns])
File "/opt/anaconda3/lib/python3.7/site-packages/modin/engines/base/frame/data.py", line 260, in _compute_axis_labels
axis, partitions, lambda df: df.axes[axis]
File "/opt/anaconda3/lib/python3.7/site-packages/modin/engines/dask/pandas_on_dask/frame/partition_manager.py", line 102, in get_indices
new_idx = client.gather(new_idx)
File "/opt/anaconda3/lib/python3.7/site-packages/distributed/client.py", line 1893, in gather
asynchronous=asynchronous,
File "/opt/anaconda3/lib/python3.7/site-packages/distributed/client.py", line 780, in sync
self.loop, func, *args, callback_timeout=callback_timeout, **kwargs
File "/opt/anaconda3/lib/python3.7/site-packages/distributed/utils.py", line 348, in sync
raise exc.with_traceback(tb)
File "/opt/anaconda3/lib/python3.7/site-packages/distributed/utils.py", line 332, in f
result[0] = yield future
File "/opt/anaconda3/lib/python3.7/site-packages/tornado/gen.py", line 735, in run
value = future.result()
File "/opt/anaconda3/lib/python3.7/site-packages/distributed/client.py", line 1752, in _gather
raise exception.with_traceback(traceback)
File "/opt/anaconda3/lib/python3.7/site-packages/distributed/protocol/pickle.py", line 59, in loads
return pickle.loads(x)
File "/opt/anaconda3/lib/python3.7/site-packages/modin/pandas/dataframe.py", line 2424, in _inflate_light
return cls(query_compiler=query_compiler)
File "/opt/anaconda3/lib/python3.7/site-packages/modin/pandas/dataframe.py", line 90, in __init__
Engine.subscribe(_update_engine)
File "/opt/anaconda3/lib/python3.7/site-packages/modin/config/pubsub.py", line 107, in subscribe
callback(cls)
File "/opt/anaconda3/lib/python3.7/site-packages/modin/pandas/__init__.py", line 122, in _update_engine
initialize_dask()
File "/opt/anaconda3/lib/python3.7/site-packages/modin/engines/dask/utils.py", line 37, in initialize_dask
num_cpus = len(client.ncores())
TypeError: object of type 'coroutine' has no len()
after rerun 'pip install modin[all]', Error is gone.
1) Run Wiggum in a regular terminal rather than vs studio since error occurs in vs studio. 2) Still need to import pandas for pandas.core 3) New error