Ameobea / orange3

Orange 3 data mining suite: http://orange.biolab.si
Other
1 stars 0 forks source link

Daily Progress Updates #7

Closed ameo-unito-bot closed 8 years ago

ameo-unito-bot commented 8 years ago

┆Issue is synchronized with this Asana task

Ameobea commented 8 years ago

Today and yesterday I worked to change the code generator to better fit with the design set forth in emails between @Pelonza, @kernc. and myself.

The entire class-based code generation system was abandoned in favor of a simple, linear order of execution. Each widget's code is evaluated in order with results being stored in a variable consisting of the widget's unique name and the channel on which it is output. These are then read into other widgets' inputs automatically at runtime.

Instead of going back to an earlier commit or starting over from scratch, I opted to use the existing code generator and simply make changes to it to make it functional for the new style of generation. This consisted of removing some subgenerators and replacing them with different ones. These new generators mostly focus on positionally inserting code relative to the body function rather than constructing a class as they used to do.

After making those changes to the code generator and updating the code generator initiator in owfile, I create a code gen initiator for owscatterplot that creates a QtApplication, loads the data from the file, and displays it.

Here is an example of the current output code: https://ameo.link/u/bin/2l0

Future work will consist of cleaning up generated code, optimizing the generator, and creating generator initiators for other widgets.

Pelonza commented 8 years ago

Casey,

The updated code looks a LOT better and more usable. Can I suggest you try doing a widget that has a direct analog in the general mining library next though?

One of (my) goals at least related to this is for a future user/student to be able to create the initial work-flow in the canvas and then modify details of the function calls otherwise hidden (or awkward to modify). Since the plotting widgets don't actually have normal mining analog's it's hard to see if the scheme you have will work.

I think what's causing me hesitation is the fact that you ended up calling the widget in the python code. I'm guessing that's necessary for the scatterplot and other non-analog'd code.

Does this generated python code run successfully and make a plot?

Ameobea commented 8 years ago

@Pelonza yes, the generated code does run and generate a plot:

The code I had before using the classes with __init__s and function declarations took its code directly out of the widget class, thus representing a low-level and highly efficient method of getting the code out of the widget.

Going off of the example code from Kernc and your previous advice, I aimed to make the script as straightforward and streamlined as possible. This was made possible by utilizing high-level functions such as, as you mentioned, constructing the widget right there in the canvas. From what I see, there are three possible solutions that I can see to this:

  1. Creating a function that replicates the widget's functionality by copy and pasting various lines from the widget's main code into one place, then inserting that function into the generated code. I was specifically told to avoid this.
  2. Create a complicated and advanced code generation system like I had before with the classes that re-uses as much widget code as possible without copy-pasting it.
  3. Use high level widget function calls in the generated code that hide most of if not all of the processing from the user but provide copies of the widget code and perhaps the entire Orange source code with the script to allow the user to modify them. This has the advantage of creating them most efficient output code and allowing for the most changes. However, users would have to dig through the orange source code to make the changes they wanted.

I plan on aiming for a mix between all of the above listed solutions. I will be using a modified version of the advanced code generator that allows for generated code to be constructed in complicated ways, I will aim to make the source code efficient and clean, but at the same time I will try to insert as many core data mining functions as possible into it as well without copy-pasting code.

I think the best bet right now is to let me finish up a couple more widgets and let you review the results. Remember, despite what Kernc or the Orange3 project have in mind in terms of code to be merged into their project, your goals for this are the ones that count.

kernc commented 8 years ago
  1. Creating a function that replicates the widget's functionality by copy and pasting various lines from the widget's main code into one place, then inserting that function into the generated code. I was specifically told to avoid this.

That depends. What I meant with avoiding having a separate function for code generation is avoiding code duplication. Imagine a widget that gets data on the input and all it does is return a subset of data equal to its first up-to-n examples. It's code would look kind of like the following:

class OWHeadSubsetSelect(OWWidget):

    inputs = [('Data', Table, 'set_data')]

    outputs = [('Subset data', Table)]

    def __init__(self):
        # Set up spin box for selecting number of examples
        # Sync that spin box with self.n attribute (widgets
        # using gui.py module normally do something like that)
        ...

    def set_data(self, data):
        """ Handle Data input signal """
        self.data = data
        # Not much else to do here except to (sometimes conditionally)
        # commit (propagate) the result
        self.commit()

    def commit(self):
        """
        Normally called when the user clicks Apply and wants
        to propagate the widget's result.
        """
        subset = self.data[:self.n]
        self.send('Subset data', subset)

This is what widgets here normally look like. It sure is not pretty, but in my view, this GUI programming is F* hard! Anyway, a valid approach is to make a code generating method like the following:

    def as_python_string(self):
        return ('{widget_name_in_scheme}_{output_signal} = '
                '{widget_name_in_scheme}_{input_signal}[:{how_many}]'.format(
                    widget_name_in_scheme = self.<...figure it out>,
                    output_signal='Subset_data',
                    input_signal='Data',
                    how_many=self.n))

This is what @janezd would have done a couple of years ago. Instead, I propose there be no separate string-generating function in every widget, but that widgets define their "main", exportable functional behaviour in a separate method, like, in this case:

    def widgets_main_function(*, top_imports=[...]):
        self.output = self.data[:self.n]

and with the widget's commit() amended:

    def commit(self):
        self.widgets_main_function()
        self.send('Subset data', self.output)

Now, the Python code generation routine can use Python's inspect module on the widget instance and do inspect.getsource(widget.widgets_main_function) to get that code in string form, in which it can replace "self." occurrences with widget's canvas name and so on. Additionally, it can access required imported objects with widget.widgets_main_function.__kwdefaults__.get('top_imports').

There's still much to be figured out taking this approach, e.g.:

But this approach should generally help in maximizing code reuse and with keeping widget's features in sync with the generated code.

Don't be afraid to modify widgets or to introduce/enforce conventions that you need to simplify this use case. The more declarative, the nicer.

kernc commented 8 years ago

Remember, despite what Kernc or the Orange3 project have in mind in terms of code to be merged into their project, your goals for this are the ones that count.

(Un)Fortunately, if you don't intend to maintain a fork, you will have to convince the project stewards to merge the code. Which will be easy enough. Just make it technically superb. :smile: