Closed vemonet closed 4 years ago
Run the workflow:
argo submit dqa-workflow-argo.yaml -f support/config-dqa-pipeline.yml
Running exactly the same Docker image with the same parameter works in pure Docker:
docker run --rm -it -v /data/dqa-workspace:/data aksw/rdfunit:latest -d http://sparql.wikipathways.org/sparql -f /data -s "https://www.w3.org/2012/pyRdfa/extract?uri=http://vocabularies.wikipathways.org/wp#" -o ttl
But give this error when run via Argo:
[ERROR] No plugin found for prefix 'exec' in the current project and in the plugin groups [org.apache.maven.plugins, org.codehaus.mojo] available from the repositories [local (/root/.m2/
repository), central (https://repo.maven.apache.org/maven2)] -> [Help 1]
[ERROR]
[ERROR] To see the full stack trace of the errors, re-run Maven with the -e switch.
[ERROR] Re-run Maven using the -X switch to enable full debug logging.
[ERROR]
[ERROR] For more information about the errors and possible solutions, please read the following articles:
[ERROR] [Help 1] http://cwiki.apache.org/confluence/display/MAVEN/NoPluginFoundForPrefixException
According to this issue it means that the java exec plugin is missing from the pom.xml: https://stackoverflow.com/questions/34770106/no-plugin-found-for-prefix-exec-in-the-current-project-and-in-the-plug-in-grou
See those 2 poms for the whole project and the validate:
The question is : why this plugin is not missing when doing a simple docker run
but fails when running through Argo?
Globally RDFUnit docker container (and its pom.xml) seems to not be appropriate, so we would need to rewrite how it compiles (the only hard part of this will be to make sure we compile the 2 well)
Issue submitted to the RDFUnit repo: https://github.com/AKSW/RDFUnit/issues/98
The Pod definition I use for the test: https://github.com/MaastrichtU-IDS/d2s-argo-workflows/blob/cd8b1432940595e6ff52b7efaa339f5d653aa609/tests/test-devnull-pod.yaml
Commands to run the test pod and connect to it (from the d2s-argo-workflow repo):
kubectl create -f tests/test-devnull-pod.yaml
kubectl exec -it test-devnull-pod -- /bin/bash
Documented here (for info): https://maastrichtu-ids.github.io/dsri-documentation/docs/openshift-debug
I fixed RDFUnit to be packaged as a standalone jar in the RDFUnit Docker container. So using it from any path with any workdir set will work (available at https://hub.docker.com/repository/docker/umids/rdfunit )
What has been done:
cli-standalone
to the default build plugins in the pom.xml (using profiles was not clear) See commits:
Descriptive statistics
We should adapt and reuse the new implementation at: https://github.com/MaastrichtU-IDS/d2s-scripts-repository/tree/master/sparql/compute-hcls-stats I think we should just properly integrate those queries in https://github.com/MaastrichtU-IDS/d2s-scripts-repository/tree/master/sparql/compute-hcls-stats
Fairsharing
Just a dockerized python script New API in dev: https://github.com/FAIRsharing/FAIRsharing-API https://github.com/MaastrichtU-IDS/fairsharing-metrics
RDFUnit
https://github.com/AKSW/RDFUnit
Validate full SPARQL endpoint. Slowly, we might need to split the validation by graphs
ShEx
https://github.com/hsolbrig/PyShEx https://github.com/iovka/shex-java would be alternative
Imho we should build a layer over PyShEx to validate an exhaustive subset of the KG (which could be extracted using HCLS)