Baukebrenninkmeijer / table-evaluator

Evaluate real and synthetic datasets against each other
https://baukebrenninkmeijer.github.io/table-evaluator/
MIT License
80 stars 27 forks source link

Not able to plot Distribution per feature and correlation graphs #25

Closed prashanthin closed 1 year ago

prashanthin commented 2 years ago

Hi everyone,

I am working on some research project which uses CTGAN to develop synthetic data. For visualization and evaluation purposes - when I am writing below code - its giving me an error -

code from table_evaluator import load_data, TableEvaluator table_evaluator = TableEvaluator(data, samples,cat_cols=discrete_columns) table_evaluator.visual_evaluation()

ValueError: cannot reindex from a duplicate axis

Can someone help on this why i am getting this error and what could be the solution. whole error below -

ValueError Traceback (most recent call last)

in 3 4 table_evaluator = TableEvaluator(data, samples,cat_cols=discrete_columns) ----> 5 table_evaluator.visual_evaluation() ~\Appdata\Local\Continuum\Anaconda3\lib\site-packages\table_evaluator\table_evaluator.py in visual_evaluation(self, save_dir, **kwargs) 374 self.plot_mean_std() 375 self.plot_cumsums() --> 376 self.plot_distributions() 377 self.plot_correlation_difference(**kwargs) 378 self.plot_pca() ~\Appdata\Local\Continuum\Anaconda3\lib\site-packages\table_evaluator\table_evaluator.py in plot_distributions(self, nr_cols, fname) 155 if col not in self.categorical_columns: 156 plot_df = pd.DataFrame({col: self.real[col].append(self.fake[col]), 'kind': ['real'] * self.n_samples + ['fake'] * self.n_samples}) --> 157 fig = sns.histplot(plot_df, x=col, hue='kind', ax=axes[i], stat='probability', legend=True) 158 axes[i].set_autoscaley_on(True) 159 else: ~\Appdata\Local\Continuum\Anaconda3\lib\site-packages\seaborn\distributions.py in histplot(data, x, y, hue, weights, stat, bins, binwidth, binrange, discrete, cumulative, common_bins, common_norm, multiple, element, fill, shrink, kde, kde_kws, line_kws, thresh, pthresh, pmax, cbar, cbar_ax, cbar_kws, palette, hue_order, hue_norm, color, log_scale, legend, ax, **kwargs) 1473 estimate_kws=estimate_kws, 1474 line_kws=line_kws, -> 1475 **kwargs, 1476 ) 1477 ~\Appdata\Local\Continuum\Anaconda3\lib\site-packages\seaborn\distributions.py in plot_univariate_histogram(self, multiple, element, fill, common_norm, common_bins, shrink, kde, kde_kws, color, legend, line_kws, estimate_kws, **plot_kws) 398 if set(self.variables) - {"x", "y"}: 399 --> 400 all_data = self.comp_data.dropna() 401 402 if common_bins: ~\Appdata\Local\Continuum\Anaconda3\lib\site-packages\seaborn\_core.py in comp_data(self) 1055 orig = self.plot_data[var].dropna() 1056 comp_col = pd.Series(index=orig.index, dtype=float, name=var) -> 1057 comp_col.loc[orig.index] = pd.to_numeric(axis.convert_units(orig)) 1058 1059 if axis.get_scale() == "log": ~\AppData\Roaming\Python\Python37\site-packages\pandas\core\indexing.py in __setitem__(self, key, value) 721 722 iloc = self if self.name == "iloc" else self.obj.iloc --> 723 iloc._setitem_with_indexer(indexer, value, self.name) 724 725 def _validate_key(self, key, axis: int): ~\AppData\Roaming\Python\Python37\site-packages\pandas\core\indexing.py in _setitem_with_indexer(self, indexer, value, name) 1730 self._setitem_with_indexer_split_path(indexer, value, name) 1731 else: -> 1732 self._setitem_single_block(indexer, value, name) 1733 1734 def _setitem_with_indexer_split_path(self, indexer, value, name: str): ~\AppData\Roaming\Python\Python37\site-packages\pandas\core\indexing.py in _setitem_single_block(self, indexer, value, name) 1957 # setting for extensionarrays that store dicts. Need to decide 1958 # if it's worth supporting that. -> 1959 value = self._align_series(indexer, Series(value)) 1960 1961 elif isinstance(value, ABCDataFrame) and name != "iloc": ~\AppData\Roaming\Python\Python37\site-packages\pandas\core\indexing.py in _align_series(self, indexer, ser, multiindex_indexer) 2094 # series, so need to broadcast (see GH5206) 2095 if sum_aligners == self.ndim and all(is_sequence(_) for _ in indexer): -> 2096 ser = ser.reindex(obj.axes[0][indexer[0]], copy=True)._values 2097 2098 # single indexer ~\AppData\Roaming\Python\Python37\site-packages\pandas\core\series.py in reindex(self, index, **kwargs) 4578 ) 4579 def reindex(self, index=None, **kwargs): -> 4580 return super().reindex(index=index, **kwargs) 4581 4582 @deprecate_nonkeyword_arguments(version=None, allowed_args=["self", "labels"]) ~\AppData\Roaming\Python\Python37\site-packages\pandas\core\generic.py in reindex(self, *args, **kwargs) 4817 # perform the reindex on the axes 4818 return self._reindex_axes( -> 4819 axes, level, limit, tolerance, method, fill_value, copy 4820 ).__finalize__(self, method="reindex") 4821 ~\AppData\Roaming\Python\Python37\site-packages\pandas\core\generic.py in _reindex_axes(self, axes, level, limit, tolerance, method, fill_value, copy) 4841 fill_value=fill_value, 4842 copy=copy, -> 4843 allow_dups=False, 4844 ) 4845 ~\AppData\Roaming\Python\Python37\site-packages\pandas\core\generic.py in _reindex_with_indexers(self, reindexers, fill_value, copy, allow_dups) 4887 fill_value=fill_value, 4888 allow_dups=allow_dups, -> 4889 copy=copy, 4890 ) 4891 # If we've made a copy once, no need to make another one ~\AppData\Roaming\Python\Python37\site-packages\pandas\core\internals\managers.py in reindex_indexer(self, new_axis, indexer, axis, fill_value, allow_dups, copy, consolidate, only_slice) 668 # some axes don't allow reindexing with dups 669 if not allow_dups: --> 670 self.axes[axis]._validate_can_reindex(indexer) 671 672 if axis >= self.ndim: ~\AppData\Roaming\Python\Python37\site-packages\pandas\core\indexes\base.py in _validate_can_reindex(self, indexer) 3783 # trying to reindex on an axis with duplicates 3784 if not self._index_as_unique and len(indexer): -> 3785 raise ValueError("cannot reindex from a duplicate axis") 3786 3787 def reindex( ValueError: cannot reindex from a duplicate axis
Baukebrenninkmeijer commented 2 years ago

Hi @prashanthin, thanks for reporting this issue. I'll discuss this problem further in #26 so please keep an eye on that.

Baukebrenninkmeijer commented 1 year ago

Solved in previous PR