An Evolutionary Scalable Framework for Synthetic Data Generation based in Data Complexity.
🇬🇧 English - 🇧🇷 Português Brasileiro
cbdgen
(Complexity-based Dataset Generation) is a software, currently in development to become a framework, that implements a many-objective algorithm to generate synthetic datasets from characteristics (complexities).
Due to the actual state of the framework, a few steps are necessary/optional to run the framework. Here we list the requirements to run this project, as well as a few tutorials:
cbdgen
It is required ECoL
package to correctly calculate data complexity, to do so you can use the following command:
./install_packages.r
If you've successfully installed R, this Rscript will work fine, but if you get any error using the R environment, Try Working with ECoL notebook to setup
ECoL
package with Python.
Let's use pip
to install our packages based on our requirements.txt
.
pip install --upgrade pip
pip install -r requirements.txt
Now you're ready to Generate Synthetic Data!
@inproceedings{Pereira_A_Many-Objective_Optimization_2022,
author = {Pereira, Steffano X. and Miranda, Péricles B. C. and França, Thiago R. F. and Bastos-Filho, Carmelo J. A. and Si, Tapas},
booktitle = {2022 IEEE Latin American Conference on Computational Intelligence (LA-CCI)},
doi = {10.1109/la-cci54402.2022.9981848},
month = {11},
pages = {1--6},
title = {{A Many-Objective Optimization Approach to Generate Synthetic Datasets based on Real-World Classification Problems}},
year = {2022}
}
For more details, see CITATION.cff.
Lorena, A. C., Garcia, L. P. F., Lehmann, J., Souto, M. C. P., and Ho, T. K. (2019). How Complex Is Your Classification Problem?: A Survey on Measuring Classification Complexity. ACM Computing Surveys (CSUR), 52:1-34.