dmey / synthia

📈 🐍 Multidimensional synthetic data generation with Copula and fPCA models in Python
https://dmey.github.io/synthia
MIT License
57 stars 9 forks source link

Add support for categorical data #10

Closed dmey closed 4 years ago

dmey commented 4 years ago

We can treat categorical data as discrete but first we need to pre-process categorical values by one hot encoding to remove the order. Re API we can change the current version from

# Assuming  an xarray datasets ds with X1 discrete and and X2 categorical 
generator.fit(ds, copula=syn.VineCopula(controls=ctrl), is_discrete={'X1': True, 'X2': False})

to something like

with X3 continuous 
g.fit(ds, copula=syn.VineCopula(controls=ctrl), types={'X1': 'disc', 'X2': 'cat', 'X3': 'cont'})