PyWorkflowApp / visual-programming

A Python Visual Programming Workspace for Data Science
MIT License
33 stars 12 forks source link

Reads to stdin and writes to stdout, batch execution of workflows #69

Closed diegostruk closed 4 years ago

diegostruk commented 4 years ago
  1. Uploads the input file to current working directory.
  2. If there is a ReadCsv node in the workflow it overrides the input file from the pyworkflow json to read from the stdin file instead.

pyworkflow execute > output-file

TODO: For reading from stdin I created a method in workflow and an execute specific one in ReadCsvNode class. I would like to refactor this to re-use the already existing methods. While some basic validations are in place there is still some work needed to be done.

reddigari commented 4 years ago

Lookin good! I left a comment about finding a way not to duplicate all the execute logic (but no actual solution haha). I also imagine this is going to have hella conflicts with Matt's significant node refactoring in #70.

reelmatt commented 4 years ago

Agree with Samir, this is looking good so far! I would try updating this branch from master to incorporate changes from #57—there's a refined execute() method there which may/may not make the CLI execution easier (or harder). (#70 shouldn't actually have that many conflicts here I think)

One suggestion: switch the --file-directory option into an argument to the execute command. Would make the command line something like, pyworkflow execute file1 [...] (and should help "batch mode" by allowing multiple workflow files).

One question: I see the simplicity writing stdin to a temp file to pass in to execution, but is there a downside to passing the raw input directly? pandas.read_csv() takes both string/file inputs so passing something like stdin_text.read() should work, and that lines up more closely with how I understand redirection in the shell (but I could be totally off there).

diegostruk commented 4 years ago

@reelmatt thanks for the suggestion of passing the file directories as an argument, I added it to this PR. I don't see any downsides of passing the raw input directly, I mainly just thought it is useful to have the file uploaded to the directory for future use, but I can look into it and push it in the next PR. @reddigari there was one merge conflict only, however, like you mentioned I had to update some logic to work with the new refactoring. This PR contains those changes. I tried re-using the current execute flow but wasn't able to find a quick fix, so will likely spend some time and include in the next PR. This PR now also contains batch execution of workflows.