Closed SteffanoP closed 2 years ago
Also, a bug was found about how pymfe
outputs its extraction value. Although you can pass a list of complexities that you want to extract. It seems that there's an order on how meta features outputs its value. A Way to reproduce this is by creating a MFE object:
mfe = MFE(groups=['complexity'], features=['C2', 'L2', 'N1', 'F2'])
Take a look at the features that we've put, the order is:
But, when we try to extract the list, this list is sorted in another pattern:
ft = mfe.extract()
print("\n".join("{:50} {:30}".format(x, y) for x, y in zip(ft[0], ft[1])))
>>> c2 0.0
>>> f2.mean 0.002206548501187337
>>> l2.mean 0.014285714285714271
>>> n1 0.12380952380952381
In this case, the sorting has become:
This is a bug in our implementation, due to the fact that we use the values on ft[1]
to calculate the fitness values, and this may not optimize correctly.
Talking about:
Test performance and quality assurance for this implementation;
I've been not able to test performance and quality assurance, due to the fact that I've lost my main setup, whereas I've used to test cbdgen
. Therefore, since cbdgen
is a development project, I pretend to keep the implementation and warn that performance and quality assurance could not be measured correctly, although a few results that I've got were promising.
Documentation will be addressed in a future update. Keep track in #38
This PR implements a new Complexity Data Extraction algorithm for the extraction of Complexity Data during evaluation, as well as to establish global references (whenever we're generating synthetic data based on a problem or a real Data Set).
Fixes #40, Fixes #31, Fixes #30.
It also open new possibilities in a few issues, such as #6 and #36.
Problem
Complexity Data is not a simple task, and performance has been noted as one of a few struggles for extraction of complexity data (as reported in #40). Although
ECoL
does a good job by extracting a vast majority of Complexity Data as one of the state of art for Extraction of Complexity Data, it is not python native. A recent solution for the implementation of Complexity Data Extraction ispymfe
. By using this package we're able to extract Complexity Data without a R to python interface and performance improvements were verified.Objectives