genericworkflownodes / GenericKnimeNodes

Base package for GenericKnimeNodes
https://github.com/genericworkflownodes/GenericKnimeNodes
Other
15 stars 16 forks source link

Generic splitter nodes for MimeTypes #144

Open jpfeuffer opened 8 years ago

jpfeuffer commented 8 years ago

We, @AlexanderFillbrunn, @temehi and I are currently planing and implementing an extension point for splitting files based on their MimeTypes with the goal to chunk inputs whenever possible and make them ready for the KNIME Cluster extension. This extension point exposes a SplitterFactory Interface that can be implemented by a plugin which may register multiple SplitterFactories (with associated Splitter instance classes) which determine with their overwritten isApplicable method, on which MimeTypes they work. A SplitterFactoryManager keeps track of the registered Splitters. During Configuration of the new generic FileSplitter node (which takes a single URIPortObject, with a single URIContent) it let's the user select a Splitter which is capable of splitting the current mimetype that is associated with the extension of this input (it takes the first if it is not configured, yet). So far so good. It works with a simple registered LineSplitter already. Implementation details to come:

jpfeuffer commented 8 years ago

I am actually not sure if we can represent such a node in our description format (CTD) even, because on the command line it would have a variable number of outputs (depending on how many chunks you want) but as an implementation of the splitter node, it has a KNIME table as output. Until we find a really good solution, we should write our own classes for each splitter, omit the CTD and just ship the binary. In the class we execute it with one of the GKN executors, passing our handwritten configurations. EDIT: Maybe a description with prefix output port works in CTD.