Rothamsted / knetbuilder

KnetBuilder data integration platform for building knowledge graphs. Previously known as ondex.
https://knetminer.com
MIT License
12 stars 11 forks source link

The Text Mining plug-in doesn't find unix4j anymore (was: PubMed efetch parsing problem?) #47

Closed jparsons2222 closed 2 years ago

jparsons2222 commented 3 years ago

I tried running a knetbuilder release from earlier this month and also with old and new downloads of wheat pubmed.xml files but had the same error message in all tests: /home/data/knetminer/etl-test/plant/126397.err:Exception in thread "main" java.lang.NoClassDefFoundError: org/unix4j/builder/To /home/data/knetminer/etl-test/plant/126408.err:Exception in thread "main" java.lang.NoClassDefFoundError: org/unix4j/builder/To

Keywan wondered if NCBI efetch downloads have changed and broken knetbuilder XML parsing?

[Yesterday 23:45] Keywan Hassani-Pak publicstaticStringEFETCH_WS="https://eutils.ncbi.nlm.nih.gov/entrez/eutils/efetch.fcgi?db=pubmed&retmode=xml&id="; ​[Yesterday 23:45] Keywan Hassani-Pak https://github.com/Rothamsted/knetbuilder/blob/b8971b570c673a59fc6cddc55846b324ea8f6845/ondex-knet-builder/modules/textmining/src/main/java/net/sourceforge/ondex/parser/medline2/xml/XMLParser.java#L192 knetbuilder/XMLParser.java at b8971b570c673a59fc6cddc55846b324ea8f6845 · Rothamsted/knetbuilder KnetBuilder data integration platform for building knowledge graphs. Previously known as ondex. - knetbuilder/XMLParser.java at b8971b570c673a59fc6cddc55846b324ea8f6845 · Rothamsted/knetbuilder github.com

Exception in thread "main" java.lang.NoClassDefFoundError: org/unix4j/builder/To         at net.sourceforge.ondex.parser.medline2.xml.XMLParser.parseMedlineXML(XMLParser.java:118)         at net.sourceforge.ondex.parser.medline2.xml.XMLParser.parseMedlineXML(XMLParser.java:152)         at net.sourceforge.ondex.parser.medline2.Parser.start(Parser.java:101)

marco-brandizi commented 3 years ago

It's not finding the unix4j package, which transitively used by the TextMining plug-in to process XML using the Java implementation of the sed utility.

The core of the problem is that plug-in jars that are added dynamically by the Ondex workflow engine don't work anymore in recent Java 11 updates. The code loads the jars, but the JVM seems to silently ignore them.

I'm about to commit a fix to this that simply add plugins/*.jar to the classpath statically, before launching the JVM. After this, the Ondex engine just re-read the same jars, but only to gather plug-in descriptors.

Next, I'll check the specific problems with efetch, being reported elsewhere. These will possibly need a separated new issue.