dmey / synthia

📈 🐍 Multidimensional synthetic data generation with Copula and fPCA models in Python
https://dmey.github.io/synthia
MIT License
57 stars 9 forks source link

Mixed type data generation failing #30

Closed nathan-greeneltch closed 1 year ago

nathan-greeneltch commented 1 year ago

Describe the bug TypeError when attempting to generate mixed ('cat', 'cont', and 'disc') data.

To Reproduce

X = pandas df with mixed types
X_float = X.select_dtypes(include='float')
X_int = X.select_dtypes(include='int')
X_object = X.select_dtypes(exclude='number')

type_dict = {}
for col in X_float.columns:
    type_dict[col] = 'cont'
for col in X_int.columns:
    type_dict[col] = 'disc'
for col in X_object.columns:
    type_dict[col] = 'cat'

orig_cols = type_dict.keys()

generator = syn.CopulaDataGenerator(verbose=False)
ctrl = pv.FitControlsVinecop(family_set=[pv.gaussian], trunc_lvl=1, select_trunc_lvl=False)
generator.fit(X[orig_cols], types=type_dict, copula=syn.VineCopula(controls=ctrl))
X_synth = generator.generate(num_rows)

Expected behavior new numpy array (X_synth) with num_rows generated samples.

Screenshots image image

System information (please complete the following information):

Additional context Complete trace: Cell In[54], line 17 generator.fit(X[orig_cols], types=type_dict, copula=syn.VineCopula(controls=ctrl))

File ~\Anaconda3\envs\synth_data\lib\site-packages\synthia\generators\copula.py:151 in fit self.feature_med = data.median(axis=0)

File ~\Anaconda3\envs\synth_data\lib\site-packages\xarray\core_aggregations.py:2128 in median return self.reduce(

File ~\Anaconda3\envs\synth_data\lib\site-packages\xarray\core\dataarray.py:3662 in reduce var = self.variable.reduce(func, dim, axis, keep_attrs, keepdims, **kwargs)

File ~\Anaconda3\envs\synth_data\lib\site-packages\xarray\core\variable.py:1970 in reduce data = func(self.data, axis=axis, **kwargs)

File ~\Anaconda3\envs\synth_data\lib\site-packages\xarray\core\duck_array_ops.py:377 in f return func(values, axis=axis, **kwargs)

File ~\Anaconda3\envs\synth_data\lib\site-packages\xarray\core\nanops.py:136 in nanmedian return nputils.nanmedian(a, axis=axis)

File ~\Anaconda3\envs\synth_data\lib\site-packages\xarray\core\nputils.py:177 in f result = getattr(npmodule, name)(values, axis=axis, **kwargs)

File <__array_function__ internals>:200 in nanmedian

File ~\Anaconda3\envs\synth_data\lib\site-packages\numpy\lib\nanfunctions.py:1217 in nanmedian return function_base._ureduce(a, func=_nanmedian, keepdims=keepdims,

File ~\Anaconda3\envs\synth_data\lib\site-packages\numpy\lib\function_base.py:3752 in _ureduce r = func(a, **kwargs)

File ~\Anaconda3\envs\synth_data\lib\site-packages\numpy\lib\nanfunctions.py:1095 in _nanmedian result = np.apply_along_axis(_nanmedian1d, axis, a, overwrite_input)

File <__array_function__ internals>:200 in apply_along_axis

File ~\Anaconda3\envs\synth_data\lib\site-packages\numpy\lib\shape_base.py:402 in apply_along_axis buff[ind] = asanyarray(func1d(inarr_view[ind], *args, **kwargs))

File ~\Anaconda3\envs\synth_data\lib\site-packages\numpy\lib\nanfunctions.py:1072 in _nanmedian1d return np.median(arr1d_parsed, overwrite_input=overwrite_input)

File <__array_function__ internals>:200 in median

File ~\Anaconda3\envs\synth_data\lib\site-packages\numpy\lib\function_base.py:3856 in median return _ureduce(a, func=_median, keepdims=keepdims, axis=axis, out=out,

File ~\Anaconda3\envs\synth_data\lib\site-packages\numpy\lib\function_base.py:3752 in _ureduce r = func(a, **kwargs)

File ~\Anaconda3\envs\synth_data\lib\site-packages\numpy\lib\function_base.py:3906 in _median rout = mean(part[indexer], axis=axis, out=out)

File <__array_function__ internals>:200 in mean

File ~\Anaconda3\envs\synth_data\lib\site-packages\numpy\core\fromnumeric.py:3464 in mean return _methods._mean(a, axis=axis, dtype=dtype,

File ~\Anaconda3\envs\synth_data\lib\site-packages\numpy\core_methods.py:194 in _mean ret = ret / rcount

TypeError: ufunc 'divide' not supported for the input types, and the inputs could not be safely coerced to any supported types according to the casting rule ''safe''

nathan-greeneltch commented 1 year ago

Closing. I was never able to get it to work with a pandas dataframe as input, but converting my data into an xarray before passing to synthia worked fine.

nathan-greeneltch commented 1 year ago

Closing. I was never able to get it to work with a pandas dataframe as input, but converting my data into an xarray before passing to synthia worked fine.