biolab / orange3

🍊 :bar_chart: :bulb: Orange: Interactive data analysis
https://orangedatamining.com
Other
4.88k stars 1.02k forks source link

[Feature/Documentation Request]: Workflow -> Python Script option? #1341

Closed Pelonza closed 5 years ago

Pelonza commented 8 years ago
Orange version

3.3

Expected behavior

The ability to convert a widget/gui workflow directly to an equivalent Python Script file. Even if it's ugly.

Actual behavior

As it seems right now, Orange supports either the GUI workflows OR directly using/writing python scripts that access the orange data-mining suite.

OR... if this is possible... it doesn't seem to be documented clearly anywhere that I could find.

Steps to reproduce the behavior

N/A

Additional info (worksheets, data, screenshots, ...)

I'm an instructor at a university, and teach some of our data mining and introduction to data science courses. I've used Weka before, but rather dislike it's interface and mechanisms. I also usually teach R as part of the data-mining, but would really like something with a much lower learning curve as an introductory software piece. Possibly to avoid even some of the initial issue of actually teaching PROGRAMMING instead of the bigger data-science picture.

Orange almost perfectly fits that bill with the GUI and being able to actually write python scripts directly to do the data-mining. HOWEVER.... there's a huge downside of needing to write it a 2nd time once you've figured out the work-flow AND of correctly using the python back-end.

Ideally, I'd love the option to turn on a 2nd "window" (or at least a widget or save option) that shows the equivalent python script calling the Orange mining procedures. I think this might also be very useful for actual USERS of Orange, as it would let them design a workflow at a high-level of abstraction and editing, then output to a python script. This would allow minor tweaking directly in the code or work to merge/enhance things outside of options available in a given widget.

kernc commented 8 years ago

That's a great idea, thanks for bringing it up. We've had it blueprinted for this year's GSoC, but in the end it didn't make it on the short list. It is definitely something we consider.

Pelonza commented 8 years ago

So... if this is something you've got penciled in/outlined...

I've got a summer research student of my own who's familiar with Python. I definitely don't want to set him an impossible task, but if it could have fit under a GSoC project, it's possibly something I could ask him to consider also. If you are willing to share the specifications/blueprint I can talk with him about it.

As I said in the issue... that's partly selfish interest as I'd love to use it for a teaching tool.. :)

kernc commented 8 years ago

We seem to have an interest in common. :smiley:

I sent you an email with the outline, but any implementation details, if the project is decided upon, should probably be discussed here for others to scrutinize as well.

Pelonza commented 8 years ago

I got permission from my department chair to go ahead and have my summer student work on this project. So it's a go for the rest of the summer/fall depending on speed/progress. I'm meeting with him this afternoon (soon) to talk over the outline you sent me.

Ameobea commented 8 years ago

Hi, I'm the research student assigned to work on this task. I just wanted to show my current progress and make myself open to input and suggestions. However, after some initial review by @kernc, it seems that what I have done so far isn't quite in line with the overall vision for this project so it's likely that most or all of the code generation will have to be re-written to meet the new model.

Current progress: https://github.com/Ameobea/orange3/commits/script-export-gui Code generation example: https://ameo.link/u/bin/2j9 (Generated from testing workflow) owfile code generator: https://ameo.link/u/bin/2jb @kernc's vision: https://paste.debian.net/779226/

The first thing I did was create a topological sort function for the workflow DAG which created a sorted list of nodes to be processed in order so that all dependency nodes are processed before their children. The nodes are then converted into a widgets. Each widget's init_code_gen function is invoked in order to generate output which is organized and inserted into the final output script file.

The code generator consists of multiple parts including generating import statements for required modules, generating declarations that go inside __init__, as well as other subgenerators for external functions, internal function definitions, and text-level line deletion and modification.

The goal of the generator is to insert all necessary code from the widget into the output to perform the same function as the initial widget without modifying or re-writing already existing widget code. I went out of my way to avoid modifying any existing widget code or so much as copy and paste a line. However, it would certainly be much more efficient in terms of the size of the output code and simplicity of the generation process to do that.

Pelonza commented 8 years ago

Note: This is in response to a separate email where Kernc provided some sample "ideal" code.

So, looking at the two files that Kernc produced and you (Casey) produced, I partially agree with Kernc, but perhaps can point to the what (might) be the actual issue...

Kernc is using the orange data mining library in his script as if it was actually a python script written with the mining library initially. (hence the loading of the file in two lines).

What you almost need is a 2nd "wrapper" around what you've generated that actually makes the final python lines or code.

Basically, your (as generated now) code would create a single "output string" from _describe --> that actual output string gets entered into the final python script/code either as a displayed line or comment.

Then, based on the full parsing of the 'init' , '_get_reader' and 'get_output' functions, generate 1-2 lines similar to Kernc's lines for file-loading that is correctly calling the actual mining library's read/load files.

You might also just be over-thinking what sorts of information you need from the actual widgets --> library use.

The orange documentation though doesn't do a great job of discussing the ability to load from a URL vs. a file-path...

Remember that while the widget makes the gui pretty and easy to use, theoretically at least as much functionality (including error catches) should be built into the library itself.

Pelonza commented 8 years ago

Looking deeper: If you dive into the actual "table.py" in the full orange library, it has two functions: orange.data.table.from_file orange.data.table.from_url

Basically, your "code generator" from the canvas needs to get the attributes from the widget with the file or url path, and then call the appropriate table function with the path. So you can hide the "decision" in your code generator, and then generate the simple 1-2 line code for loading the table.

orange.data.table already contains all the needed imports and checking of the filenames etc. I don't know if it does the helpful output of how many of what attributes the data has, but those ought to be otherwise-callable if needed.

I think part of your challenge here may be that, unlike I initially thought, it looks like the widgets (or at least this file widget) doesn't actually call the mining-library functions via wrappers.

kernc commented 8 years ago

Table's constructor accepts a string and then calls its from_file() or from_url() (etc.) as appropriate.

Pelonza commented 8 years ago

Even easier then. Perhaps most of the actual "work" is figuring out what has direct, easily used correspondences in orange's main library.

kernc commented 8 years ago

Indeed. And widgets, save for the GUI handling/painting/manipulating/... code, mostly do or should do just that.

kernc commented 8 years ago

@astaric, @janezd, @lanzagar, @ales-erjavec, @s-alexey For anyone interested, there's some technical discussion also in https://github.com/Ameobea/orange3/issues/7.

mldwr commented 7 years ago

Hi, may I ask what the status of this issue is? Has this feature been commited to Orange and will be available any time soon? Thanks

kernc commented 7 years ago

Nobody is working on it. It's free to take if you're interested.

MrMauricioLeite commented 7 years ago

I must say that having a way to export any workflow to python code sound amazing. This would take the tool to a whole new level and enable it to kickstart code that can later be improved on code.

Is it in the roadmap?

lubianat commented 6 years ago

Hello, quite interested here in this exportation to python script too. Unfortunately, I do not have nor the skills or availability to fulfill such task now. Was there progress on this matter in the past times? I was not able to find anything on this.

Thanks

JoeB-UT commented 6 years ago

I think @Ameobea worked on the request for a while, but it is not as easy as one might hope. More Details: https://github.com/Ameobea/orange3/issues/7 https://ameo.link/u/bin/2jb https://paste.debian.net/779226/

Having deployment functionality like this would make Orange a superior top tier development tool.

Pelonza commented 6 years ago

@Ameobea was working on it.. (he is/was an undergraduate student working under me).

I don't believe he ever finished the capability, due to some dependencies elsewhere.... I can check in with him about it though.

I think some of what he had working was really just an ugly use of the same widget structure, rather than actual native python/orange calls....and wasn't quite as pretty as we'd all hoped might happen.

================================================== To schedule a meeting or appointment try: https://karlrbschmitt.youcanbook.me/

Dr. Karl Schmitt Assistant Professor Department of Mathematics and Statistics Department of Computing and Information Sciences Director of Data Science Program Director of Analytics and Modeling Graduate Program Valparaiso University, Indiana

On Mon, Aug 20, 2018 at 6:39 PM, Joseph Bennett notifications@github.com wrote:

I think @Ameobea https://github.com/Ameobea worked on the request for a while, but it is not as easy as one might hope. More Details: Ameobea#7 https://github.com/Ameobea/orange3/issues/7 https://ameo.link/u/bin/2jb https://paste.debian.net/779226/

Having deployment functionality like this would make Orange a superior top tier development tool.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/biolab/orange3/issues/1341#issuecomment-414498762, or mute the thread https://github.com/notifications/unsubscribe-auth/AMRGU-HPSmikI2Zl3oWxsSRCT18_fGygks5uS0itgaJpZM4I2y2Q .

hemangjoshi37a commented 6 years ago

Actually I was developing my own software for machine learning GUI (Link) but then i found orange on intenet which is a peace of art i would say. But I can help to make python script converter. Please help me if you have any starting point from where should I start.

aatarifi commented 6 years ago

I just find this discussion actually it would be very useful feature once implemented, I was looking for such ability in the Orange data mining tool

janezd commented 5 years ago

As a part of the regular issue purging fest, I'm closing this one. I don't think that implementation of this feature depends upon whether this github issue is here or not.

To support conversion of workflows into scripts, each and every widget would have to provide the necessary code (or whatever definitions that would be required). Otherwise, a workflow would be convertible only if it contained only convertible widgets. Hence, users would constantly complain about each and every widget that does not support this functionality. Besides, whenever a widget is improved, the improvements would have to be reflected in the exportable code. This would put a lot of burden on widget developers.

There may come time when the core group decides that Orange absolutely needs this feature and it's worth investing time and promising the future commitment. If this happens, we are going to implement this feature - and having this github issue open or not does not affect this decision.

joenobk commented 5 years ago

Having just a subset of widgets with exportable code would be tremendous. Or even having an example of code equivalent for each widget might be used would be of great value. Orange is so useful for exploration and analysis, but a script may be more useful for deployment.

janezd commented 5 years ago

I agree. But even if it's just a subset of widgets, we need to establish the framework for this, and this is a lot of work (particularly, a lot of thinking) if we want to do it properly. If somebody would like to participate, (s)he's welcome. :)

prykon commented 4 years ago

This has been in the works for 4 years now... any updates?

ajdapretnar commented 4 years ago

This has not been in the works for 4 years. The issue has been closed. Nobody is working on it in the core group. Please read the discussion above for the details.

TrinadhKumarKatakaraju commented 4 years ago

Hello All, @Pelonza,

Any update on the above topic?

borondics commented 4 years ago

We have been discussing a similar feature with @markotoplak...

fititnt commented 2 years ago

Hi everyone. I believe I got something close to allow this feature at least for pip users. The early proof of concept is not beauty, and have some limitations (which also explain why it migth be possible to implement