Closed chahuistle closed 7 years ago
This figure depicts the hierarchy of the new classes/interfaces. The main interface is CommandLineElement
and all other classes/interfaces implement/extend it. A brief explanation of each class/interface follows:
AbstractCommandLineElement
is the base class for all other classes. This class has the implementation of the core methods: getKey
, setKey
, getValue
, setValue
, getSequenceNumber
, setSequenceNumber
. The getStringRepresentation
is not implemented here, so implementations are forced to provide their own.CommandLineFixedString
represents an immutable string in the command line, e.g., --ph
in the command line --ph 3.7
, because the value, 3.7
, could be set to a different number.ParameterizedCommandLineElement
is a tagging interface to identify implementations of the interface CommandLineElement
that wrap around an object that represents a parameter.CommandLineParameter
implements CommandLineElement
and wraps around instances of Parameter<T>
classes. It provides users the ability to use a suffix and a prefix when the string representation is generated.CommandLineFile
represents command line elements that refer to a file. Handling of files is critical when converting workflows from KNIME to other platforms (see section File Handling). This has two class that extend it, namely, CommandLineCTDFile
and CommandLineKNIMEWorkflowFile
.Let's take a look of how a simple workflow in KNIME handles files. A GKN requires an input file and produces a file. Both files are stored in KNIME's temporary folder (e.g., /tmp
). The command line of that node would look like:
$ [tool_name] --input /tmp/input0.csv --output /tmp/output0.xml
If we were to export the workflow on which this GKN has been included, we can no longer assume that the folder /tmp/
is available. This is handled by each target system in a different way. Therefore, when a GKN is to be exported, we need to keep track of the input and output files and, depending on the destination platform, these paths will be modified by KNIME2Grid.
CTDs are indeed files, but they must be handled in a different way than ordinary files. Imagine a GKN that is fully CTD compatible, such as any OpenMS, SeqAn or BALL tool. Whenever it's executed in KNIME, the command line would be similar to:
$ [tool_name] --ctd /tmp/ctd0.ctd
Treating CTDs as files is not enough abstraction, because CTDs contain information about other files. Imagine that said tool produces one output file, requires one input file and one parameter. A section of /tmp/ctd0.ctd
could look like:
<ITEM name="input" type="input-file" value="/tmp/input0.csv" />
<ITEM name="output" type="output-file" value="/tmp/output0.csv" />
<ITEM name="pH" type="double" value="7.2" />
If we were to export a workflow containing this tool, we would also need to modify the paths contained in CTDs, because we cannot assume that local paths will exist on remote execution environments. This is why the class CommandLineCTDFile
exists.
It is possible to execute a KNIME workflow using the so-called batch mode. For this, one must provide the location of the archive that contains a valid KNIME workflow to execute, as shown below:
$ [path_to_knime] -workflowFile="/share/wfs/knime/wf_1.zip"
The class CommandLineKNIMEWorkflowFile
represents these kind of files. KNIME2Grid generates these files on the fly, so it is possible to execute KNIME workflows on other platforms where KNIME has been installed and no user interface is required.
Summary of the Changes
New
CommandLineElement
Interface and its ImplementationsKNIME2Grid is a KNIME extension that will let users export KNIME workflows to other platforms such as Galaxy, gUSE. The handling of native KNIME nodes and Generic KNIME Nodes differs in order to benefit from the fact that the tools wrapped by GKN don't need KNIME to run. For this, extra information on each of the command line parameters is needed. For instance, think of a KNIME workflow in which a tool wrapped by GKN generates a file. The command line would look like:
tool --input /var/tmp/input0.csv --output /var/tmp/output0.txt --length 5
If this workflow were to run on a different platform, the paths of the input and output files would need to be changed. The new included package
com.genericworkflownodes.knime.commandline
contains interfaces and classes that wrap around command line elements and can generate a string representation.For the sake of brevity, let's assume that our
tool
had a custom command generator that hardcoded all values (i.e., an implementation of anICommandGenerator
). Before this refactoring, the method that generated the command line would have looked like:The generation of the command line would be, of course, the concatenation of the obtained list:
After the refactoring, this method looks similar to:
While the generation of the command line would look similar to:
When the list of commands is passed down to the methods that handle the conversion, these methods will know that a
CommandLineFixedString
stays like it is, but aCommandLineFile
has to be handled in a different way depending on the target platform. Furthermore, these new classes contain useful information such as sequence number, in case a platform relies on this kind of information.Handling of CTD files is special, since these have to be generated and are not true input files, because they contain information about the parameters and include paths of needed files. So, if we take a look at the
OpenMSCommandGenerator
before the refactoring we find:After the refactoring, this method looks like:
The code responsible to convert such a node would find that a CTD file is included in the command line and would have enough information to handle it appropriately.
Exposed Methods, Packages
KNIME2Grid needs access to the new classes and also to some other packages in order to inspect in detail the nodes to convert. A summary of the newly exposed information follows:
MANIFEST.MF
, new and other needed packages are exposed to other plugins.GenericKnimeNodeModel
, so the KNIME2Grid extension can access the command line without the need to invoke theexecute
method.