daffidwilde / edo

A library for generating artificial datasets through genetic evolution.
https://doi.org/10.1007/s10489-019-01592-4
MIT License
13 stars 0 forks source link

Implementing copulas #146

Open daffidwilde opened 3 years ago

daffidwilde commented 3 years ago

As raised in the further work section of my thesis, the use of copula functions would offer an elegant solution to handling column relationships. An excerpt from that section (https://github.com/daffidwilde/thesis/pull/101):

Copulas are functions that join multivariate distribution functions to their one-dimensional margins [235]. For EDO, this would mean P would contain a single element: a copula function fitted to the existing dataset. In this case, the technical aspects of an individual’s representation would need adjusting to accommodate this change. Likewise, the crossover and mutation processes would require some changes to account for the lack of distinct distribution families.

A Python implementation of copulas for data synthesis exists [12] and incorporating this as a dependency of the edo library would reduce the work required to implement this feature. Studying the impact of copulas in EDO would provide a valuable opportunity to demonstrate the capabilities of EDO as a fully fledged data synthesis method.