Ameobea / orange3

Orange 3 data mining suite: http://orange.biolab.si
Other
1 stars 0 forks source link

Daily Progress Updates #19

Closed ameo-unito-bot closed 8 years ago

ameo-unito-bot commented 8 years ago

┆Issue is synchronized with this Asana task

Ameobea commented 8 years ago

Lots of progress today.

I worked extensively to both improve repr coverage for all preprocessors and learners. I've got pretty complete coverage of all the preprocessor objects as well as learners. Since I've been using the code generator to develop and test the reprs, I've committed changes to the code-generation branch and copied only changes that have to do with reprs into the repr branch.

Here's the comparison of the repr branch so far to the master branch I forked it from: https://github.com/Ameobea/orange3/compare/master...Ameobea:repr

In addition to the repr changes, I've also been working on integrating all of these changes into the current code generator. I've worked to improve dependency importing and clean up unnecessary code in as many places as possible. The generator is currently capable of, to my knowledge, creating scripts for any combination of learner-class widgets and preprocessors possible in addition to the few miscellaneous visualization and data widgets I've created generators for.

I think that, except for visualization widgets, there will be no need to construct widget objects for any other widgets (except visualizations) and expect the process of building code generators for them to be much quicker than before due to the higher quality generator and repr backend.

Ameobea commented 8 years ago

@Pelonza @Kernc here is an example of the current code generated for my test scheme:

#Script generated by Orange3

from Orange.classification.tree import TreeLearner
from Orange.data.table import Table
from Orange.preprocess.preprocess import PreprocessorList
from Orange.widgets.classify.owclassificationtreegraph import OWClassificationTreeGraph
from Orange.widgets.visualize.owscatterplot import OWScatterPlot
from PyQt4.QtGui import QApplication
import numpy

from Orange.preprocess import *
import numpy as np

qapp = QApplication([])

from Orange.preprocess.score import *
from Orange.classification import *

#
#file0
#
dataPath = "/home/casey/Documents/orange-dev/orange3/Orange/datasets/brown-selected.tab"

data = Table(dataPath)
file0_data = data

#
#preprocess1
#
preprocessor = PreprocessorList([
    Discretize(method=EqualFreq(n=4), remove_const=False),
    ProjectPCA(n_components=10),
    Randomize(),
    Scaling(),
    SelectRandomFeatures(k=10),
    SelectBestFeatures(method=InfoGain, k=10, ),
    Continuize(multinomial_treatment='Indicators'),
    Impute(method=Average()),
])

input_data = file0_data
preprocess1_preprocessor = preprocessor
preprocess1_preprocessed_data = preprocessor(input_data)

#
#file2
#
dataPath = "/home/casey/Documents/orange-dev/orange3/Orange/datasets/iris.tab"

data = Table(dataPath)
file2_data = data

#
#scatter_plot3
#
ow = OWScatterPlot()

input_data = file2_data
ow.set_data(input_data)
try:
    ow.set_subset_data(input_data_subset)
except:
    pass
ow.handleNewSignals()
ow.show()
qapp.exec()
scatter_plot3_other_data = ow.data[np.full(len(ow.data), True, dtype=bool)]
scatter_plot3_selected_data = ow.data[ow.graph.get_selection()]

#
#classification_tree4
#
learner = TreeLearner(splitter='best', max_leaf_nodes=None, max_features=None, criterion='entropy', random_state=None, min_samples_split=7, max_depth=100, min_samples_leaf=2)

input_preprocessor = preprocess1_preprocessor
input_data = preprocess1_preprocessed_data
model = learner(input_data)
model.instances = input_data
classification_tree4_learner = learner
classification_tree4_classifier = model

#
#classification_tree_viewer5
#
ow = OWClassificationTreeGraph()

ow.settingsHandler.initialize(ow, data={
    "max_node_width": 100,
    "zoom": 1,
    "line_width_method": 2,
    "max_tree_depth": 0,
})

input_classification_tree = classification_tree4_classifier
ow.handleNewSignals()
ow.ctree(input_classification_tree)
# Update display with above settings
ow.toggle_zoom_slider()
ow.toggle_node_size()
ow.toggle_tree_depth()
ow.toggle_line_width()
ow.toggle_color()
# Display classification tree
ow.show()
qapp.exec()
ow.update_selection()
classification_tree_viewer5_data = data

#
#scatter_plot6
#
ow = OWScatterPlot()

input_data = file0_data
input_data_subset = classification_tree_viewer5_data
ow.set_data(input_data)
try:
    ow.set_subset_data(input_data_subset)
except:
    pass
ow.handleNewSignals()
ow.show()
qapp.exec()
scatter_plot6_other_data = ow.data[np.full(len(ow.data), True, dtype=bool)]
scatter_plot6_selected_data = ow.data[ow.graph.get_selection()]

#
#scatter_plot7
#
ow = OWScatterPlot()

input_data = scatter_plot6_selected_data
ow.set_data(input_data)
try:
    ow.set_subset_data(input_data_subset)
except:
    pass
ow.handleNewSignals()
ow.show()
qapp.exec()
scatter_plot7_other_data = ow.data[np.full(len(ow.data), True, dtype=bool)]
scatter_plot7_selected_data = ow.data[ow.graph.get_selection()]

As you can see, the preprocessor widget and the classification tree both feature correctly imported native Orange objects initialized with the correct parameters. The output is unedited and runs correctly without error (except for a seg fault after it's done executing due to me not destroying the PyQT Application).

The Classification Tree has no individual code generator but rather uses a base generator from owlearnerwidget. Inputs/outputs are linked without issue and no errors are seen during the generation process.

Ameobea commented 8 years ago

I will be using the rest of the week to tweak the process as a whole as needed and get full code generation functionality available for as many widgets as possible. I'm leaving on a week-long trip on Saturday so I want to get as much done before then as possible.

Please let me know what I can do to improve what I have here so far or any feedback on my plan for the future.