EyeofBeholder-NLeSC / knime-demo

This is for keeping files for demostrating the usage of knime.
Apache License 2.0
0 stars 0 forks source link

Input data schema validation #3

Open jiqicn opened 2 years ago

jiqicn commented 2 years ago

Investigate possible ways of validating input data against the ontology.

jiqicn commented 2 years ago

Nodes, Components, and Workflows on KNIME Hub

1. Table validator and Table validator (reference)

2. R and Python Integration

3. Interactive Data Cleaning Component

dafnevk commented 2 years ago

The Table validator nodes seem very useful and close to what we need. It seems there is no compatibility with the CSV on the web specification, which is a more fomalized and interoprable way of describing the CSV table than with a reference table or within the configuration.

dafnevk commented 2 years ago

Here is an existing csv-on-the-web validator in Ruby that could be used for a customized node.

Also a python implementation (not well documented and seems to be python 2) and an R implementation.

dafnevk commented 2 years ago

In case we want to use the Ruby implementation, here is a ruby wrapper node for Knime. This works with jruby, a java implementation of Ruby. It's not clear to me how this wrapper nodes handles the ruby environment and dependencies, I think jruby and all dependencies already need to be installed on the system. See also documentation of the jruby node.

jiqicn commented 2 years ago

I also found a CSV on the Web validator in java that might be useful for developing the customized node.

At bottom of this note, you can find implementations of csvw validator in different languages (Python, Ruby, Javascript, Web, and R).

dafnevk commented 2 years ago

Note that the csvlint (ruby) implementation also has a webservice with an API: http://csvlint.io/documentation

dafnevk commented 2 years ago

Another python implementation, built in Clariah: https://github.com/clariah/cow

jiqicn commented 2 years ago

Deploy KNIME extension

As this note said, there are two ways of sharing the extension.

One is to build a local update site for the extension. The ideal situation is to become a contributor to the KNIME community, but that requires many efforts (see this link). It's also possible to have the local update site shared in different ways (e.g. github), but in that case, a dropin will be more convenient than a local update site.

The second way is to wrap the extension as a dropin, which is actually a .jar file. When deploying the extension, users simply needs to put the dropin file in the dropins folder of the local KNIME installation and restart KNIME.