PyWorkflowApp / visual-programming

A Python Visual Programming Workspace for Data Science
MIT License
31 stars 12 forks source link

Robust node parameterization #48

Closed reddigari closed 4 years ago

reddigari commented 4 years ago

I'm starting to think about defining classes for parameter values and types rather than using the dictionaries as class attributes. (This is similar to how Luigi does parameterization.) So options could look something like:

options = {
    "file": FileParameter(desc="File to upload"),
    "sep": StringParameter(default="," desc="Single-character delimiter"),
    "header": IntParameter(default=0, desc="Row number with column names")
}

Each of those classes could implement to_json() and from_json(), so they can be sent to the front-end as JSON and parsed from HTTP requests into their Python representation.

I think it lends itself to implementing good validation (at least server-side), and possibly even handling the many pandas args that can be strings or ints.

Thoughts?

reelmatt commented 4 years ago

This sounds like a great approach to me!

A few questions/things to think about:

diegostruk commented 4 years ago

I think this is a good idea and it will make the parameter passing a lot more robust. To @reelmatt question,I believe the Parameter class will be handling that validation. @reddigari, is that what you were thinking?

reddigari commented 4 years ago

Definitely an open question! If it's baked into the Parameter classes, it would be easy to automatically catch type-validation things like if you gave a string that couldn't be parsed as an integer (because I think everything sent from the front end form is a string and has to be cast). But if it were the responsibility of the Node, then there's the possibility for validating parameters against each other, or even the input data.

I could envision an in-between solution where Node must implement a validate() method, but the base class implementation of validate calls the built-in validation of each Parameter instance? (This is totally half-baked)

reddigari commented 4 years ago

@reelmatt Good call on the lists and dicts, I have absolutely no idea how to handle that

matthew-t-smith commented 4 years ago

I'm definitely in favor of this approach - very smart. Gives us a lot more control in how we pass types around, and I think having validate() in both places could be beneficial, as you mentioned @reddigari - simpler validations from user input for Parameter.validate() with maybe the additional Node.validate() for a deeper check on parameters working together for node execution.

reelmatt commented 4 years ago

My thought on lists in particular might look something like 'Header' in this updated wireframe.

ReadCsvConfig

Something like a ListParameter class could be used to tell the front-end to provide +/- options for adding more than one input and then the back-end could loop through the values and do a list.append(val). Dicts could maybe work similarly by either:

Another approach could be to include a radio button for inputs that allow for different types (e.g., 'single', 'list', 'dict') with the form changing based on the selection.

I'm not sure how easy/hard any of this would be to configure on the front-end or how nice the parsing would be on the back-end. These could also be out-of-scope for the project and we just implement a few basic parameters.