Ameobea / orange3

Orange 3 data mining suite: http://orange.biolab.si
Other
1 stars 0 forks source link

Daily Progress Updates #21

Closed ameo-unito-bot closed 8 years ago

ameo-unito-bot commented 8 years ago

┆Issue is synchronized with this Asana task

Ameobea commented 8 years ago

I started off by trying to find a way to manually insert inputs into the widgets that were generated during code generation. As it turns out, that isn't necessary at all and by removing some hacky code that I inserted as I was still learning the Orange codebase, I was able to give access to the actual widget instances from the scheme rather than having to create new, blank ones.

Using this newly acquired capability, I wrote code generators for owselectrows and owselectcolumns as well as creating some __repr__ functions for two Orange.data.filter classes.

kernc commented 8 years ago

Can you show an example generated code for Select Rows and Select Columns?

Ameobea commented 8 years ago

Sure thing. Generated from this scheme:

#Script generated by Orange3

from Orange.data.domain import Domain
from Orange.data.filter import FilterContinuous
from Orange.data.filter import Values
from Orange.data.table import Table
from Orange.data.variable import ContinuousVariable
from Orange.data.variable import DiscreteVariable
from Orange.data.variable import StringVariable
from Orange.preprocess.remove import Remove
from Orange.widgets.visualize.owscatterplot import OWScatterPlot
from Orange.widgets.widget import AttributeList
from PyQt4.QtGui import QApplication
import numpy

qapp = QApplication([])

import numpy as np

#
#file0
#
dataPath = "/home/casey/Documents/orange-dev/orange3/Orange/datasets/brown-selected.tab"

data = Table(dataPath)
file0_data = data

#
#select_columns1
#
attributes = [ContinuousVariable('alpha 63'), ContinuousVariable('cold 40'), ContinuousVariable('dtt 60'), ContinuousVariable('dtt 120'), ContinuousVariable('cold 0'), ContinuousVariable('cold 20')]
class_var = [DiscreteVariable('function', values=['Proteas', 'Resp', 'Ribo'])]
metas = [StringVariable('gene')]

input_data = file0_data
domain = Domain(attributes, class_var, metas)
newdata = input_data.from_table(domain, input_data)
select_columns1_data = newdata
select_columns1_features = AttributeList(attributes)

#
#select_rows2
#
filters = Values([FilterContinuous(0, 5, ref=0.0), FilterContinuous(4, 1, ref=954.0), FilterContinuous(0, 7, ref=99.0, max=105.0)])
purge_attrs = True
purge_classes = True

input_data = select_columns1_data
if filters is not None:
    matching_output = filters(data)
    filters.negate = True
    unmatched_data = filters(data)
else:
    matching_output = input_data
    unmatched_data = None

if purge_attrs or purge_classes:
    attr_flags = sum([Remove.RemoveConstant * purge_attrs,
                      Remove.RemoveUnusedValues * purge_attrs])
    class_flags = sum([Remove.RemoveConstant * purge_classes,
                      Remove.RemoveUnusedValues * purge_classes])
    # same settings used for attributes and meta features
    remover = Remove(attr_flags, class_flags, attr_flags)

    matching_output = remover(matching_output)
    unmatched_data = remover(unmatched_data)
select_rows2_unmatched_data = unmatched_data
select_rows2_matching_data = matching_output

#
#scatter_plot3
#
ow = OWScatterPlot()

input_data = select_rows2_matching_data
ow.set_data(input_data)
try:
    ow.set_subset_data(input_data_subset)
except:
    pass
ow.handleNewSignals()
ow.show()
qapp.exec()
scatter_plot3_selected_data = ow.data[ow.graph.get_selection()]
scatter_plot3_other_data = ow.data[np.full(len(ow.data), True, dtype=bool)]
kernc commented 8 years ago

This looks pretty good! Perhaps you could use pprint.pformat() to more nicely format long strings like

attributes = [ContinuousVariable('alpha 63'), ...
kernc commented 8 years ago

Regarding filters in Select Rows (and such), we have a huge GSoC PR in the works which hopes to (somewhat compatibly) migrate current Orange.data.Table to pandas.DataFrame (etc.) (https://github.com/biolab/orange3/pull/1347/). Orange.data.filter.* stuff, FWIK, have been deprecated in terms of complete purge. That huge PR is looking to be merged really soon (in either case before this one) so do expect some conflicts. :+1:

Ameobea commented 8 years ago

pprint.pformat looks like a really good solution; I'll look to make that happen.

I'll rebase my branch after that PR goes through and fix conflicts, thanks for the heads up.