fmi-basel / faim-luigi

[WIP] utilitites for luigi workflows
0 stars 0 forks source link

Improve KNIME wrapper task #2

Open imagejan opened 3 years ago

imagejan commented 3 years ago

Calling the KNIME executor via the knime python module allows to use e.g. a Pandas DataFrame as input table, this would obviate the need of using global workflow variables as inputs.

import knime

df = ...

with knime.Workflow(workflow_path=workflow, workspace_path=workspace) as wf:
  wf.data_table_inputs[0] = df
  wf.execute()

See this blog post for more information.

This could make the KNIME wrapper task useful for more use cases. What do you think, @rempfler?

rempfler commented 3 years ago

This could make the KNIME wrapper task useful for more use cases. What do you think, @rempfler?

Yes, it could be an interesting enhancement. I did stumble over the tutorial and had the impression that the global variables were less intrusive than these input/output containers for the current use-case. But this you can probably judge better from your experience with knime. It would definitely enable more specialized applications, since we could delegate writing/postprocessing outputs from knime in the luigi task. Would you be interested in drafting up a pull request @imagejan ?

Perhaps it would make sense to encapsulate the workflow execution in a separate method, say run_knime(self, *args, **kwargs): -> pandas.DataFrame which would return the output(s?). The specializing task could then use it even if it needs to overwrite run (e.g. like I already needed to make it interruptible).