globalbioticinteractions / nomer

maps identifiers and names to other identifiers and names
GNU General Public License v3.0
18 stars 3 forks source link

plazi treatment indexer stops prematurely with java.lang.NoClassDefFoundError: org/apache/commons/codec/digest/MurmurHash3 #126

Closed jhpoelen closed 1 year ago

jhpoelen commented 1 year ago

when running:

echo -e "\tHomo sapiens" | nomer append plazi

the following exception is seen:

[main] INFO org.globalbioticinteractions.nomer.match.ResourceServiceReadOnly - using cached [https://github.com/plazi/treatments-rdf/archive/master.zip] at [/home/jorrit/.cache/nomer/hash/sha256/b3742bf43d9da0a8ed5522659199f47d68d31aaf46c90381190f324c1ac143f2/b176164ee1afeaab4d30171fea98c6f9aa2dc6dbbfdcbeab740f19b260e292ed.gz]
java.lang.NoClassDefFoundError: org/apache/commons/codec/digest/MurmurHash3
    at org.apache.jena.riot.lang.BlankNodeAllocatorHash.alloc(BlankNodeAllocatorHash.java:138)
    at org.apache.jena.riot.lang.BlankNodeAllocatorHash.create(BlankNodeAllocatorHash.java:111)
    at org.apache.jena.riot.lang.LabelToNode$Alloc.create(LabelToNode.java:187)
    at org.apache.jena.riot.lang.LabelToNode$Alloc.create(LabelToNode.java:178)
    at org.apache.jena.riot.system.MapWithScope.create(MapWithScope.java:86)
    at org.apache.jena.riot.system.FactoryRDFStd.createBlankNode(FactoryRDFStd.java:97)
    at org.apache.jena.riot.system.ParserProfileStd.createBlankNode(ParserProfileStd.java:199)
    at org.apache.jena.riot.lang.LangTurtleBase.triplesBlankNode(LangTurtleBase.java:490)
    at org.apache.jena.riot.lang.LangTurtleBase.triplesNodeCompound(LangTurtleBase.java:479)
    at org.apache.jena.riot.lang.LangTurtleBase.triplesSameSubject(LangTurtleBase.java:188)
    at org.apache.jena.riot.lang.LangTurtle.oneTopLevelElement(LangTurtle.java:46)
    at org.apache.jena.riot.lang.LangTurtleBase.runParser(LangTurtleBase.java:79)
    at org.apache.jena.riot.lang.LangBase.parse(LangBase.java:41)
    at org.apache.jena.riot.RDFParserRegistry$ReaderRIOTLang.read(RDFParserRegistry.java:184)
    at org.apache.jena.riot.RDFParser.read(RDFParser.java:353)
    at org.apache.jena.riot.RDFParser.parseNotUri(RDFParser.java:343)
    at org.apache.jena.riot.RDFParser.parse(RDFParser.java:292)
    at org.apache.jena.riot.RDFParserBuilder.parse(RDFParserBuilder.java:540)
    at org.apache.jena.riot.RDFDataMgr.parseFromInputStream(RDFDataMgr.java:901)
    at org.apache.jena.riot.RDFDataMgr.read(RDFDataMgr.java:299)
    at org.apache.jena.riot.RDFDataMgr.read(RDFDataMgr.java:285)
    at org.apache.jena.riot.adapters.RDFReaderRIOT.read(RDFReaderRIOT.java:69)
    at org.apache.jena.rdf.model.impl.ModelCom.read(ModelCom.java:283)
    at org.apache.jena.ontology.impl.OntModelImpl.read(OntModelImpl.java:2232)
    at org.globalbioticinteractions.nomer.match.PlaziTreatmentsLoader.importTreatment(PlaziTreatmentsLoader.java:33)
    at org.globalbioticinteractions.nomer.match.PlaziService.indexTreatments(PlaziService.java:161)
    at org.globalbioticinteractions.nomer.match.PlaziService.lazyInit(PlaziService.java:116)
    at org.globalbioticinteractions.nomer.match.PlaziService.lookupLinkedTerms(PlaziService.java:85)
    at org.globalbioticinteractions.nomer.match.PlaziService.match(PlaziService.java:51)
    at org.eol.globi.service.TermMatcherHierarchical.match(TermMatcherHierarchical.java:57)
    at org.globalbioticinteractions.nomer.util.AppendingRowHandler.onRow(AppendingRowHandler.java:36)
    at org.globalbioticinteractions.nomer.match.MatchUtil.apply(MatchUtil.java:85)
    at org.globalbioticinteractions.nomer.match.MatchUtil.match(MatchUtil.java:37)
    at org.globalbioticinteractions.nomer.cmd.CmdAppend.run(CmdAppend.java:20)
    at picocli.CommandLine.executeUserObject(CommandLine.java:1939)
    at picocli.CommandLine.access$1300(CommandLine.java:145)
    at picocli.CommandLine$RunLast.executeUserObjectOfLastSubcommandWithSameParent(CommandLine.java:2358)
    at picocli.CommandLine$RunLast.handle(CommandLine.java:2352)
    at picocli.CommandLine$RunLast.handle(CommandLine.java:2314)
    at picocli.CommandLine$AbstractParseResultHandler.execute(CommandLine.java:2179)
    at picocli.CommandLine$RunLast.execute(CommandLine.java:2316)
    at picocli.CommandLine.execute(CommandLine.java:2078)
    at org.globalbioticinteractions.nomer.Nomer.run(Nomer.java:57)
    at org.globalbioticinteractions.nomer.Nomer.main(Nomer.java:46)
Caused by: java.lang.ClassNotFoundException: org.apache.commons.codec.digest.MurmurHash3
    at java.base/jdk.internal.loader.BuiltinClassLoader.loadClass(BuiltinClassLoader.java:581)
    at java.base/jdk.internal.loader.ClassLoaders$AppClassLoader.loadClass(ClassLoaders.java:178)
    at java.base/java.lang.ClassLoader.loadClass(ClassLoader.java:522)
    ... 44 more
jhpoelen commented 1 year ago

root cause was the inclusion of an older version of commons-codec , instead of the desired version commons-codec:commons-codec:1.15 .

jhpoelen commented 1 year ago

After applying fix, the following (expected) output was generated:

$ echo -e "\tHomo sapiens" | nomer append plazi
[main] INFO org.globalbioticinteractions.nomer.match.PlaziService - Indexing Plazi treatments ...
[main] INFO org.globalbioticinteractions.nomer.match.ResourceServiceContentBased - using local Preston data dir: [/home/jorrit/.cache/nomer/data]
[main] INFO org.globalbioticinteractions.nomer.match.ResourceServiceContentBased - caching [https://github.com/plazi/treatments-rdf/archive/master.zip] at [/home/jorrit/.cache/nomer/tmp/nomer4573412192819626404.gz]...
[https://zenodo.org/recor...6c90381190f324c1ac143f2] 100.0% of 11 kB at 1.40 MB/s completed in < 1 minute
[https://zenodo.org/recor...000c74e4e8fdeb937e29b1d] 100.0% of 34 kB at 8.31 MB/s completed in < 1 minute
[https://zenodo.org/recor...693d16efda23865d6cbf303] 100.0% of 767 MB at 3.76 MB/s completed in 3 minute(s)
[main] INFO org.globalbioticinteractions.nomer.match.ResourceServiceContentBased - caching [https://github.com/plazi/treatments-rdf/archive/master.zip] at [/home/jorrit/.cache/nomer/tmp/nomer4573412192819626404.gz] done.
[main] INFO org.globalbioticinteractions.nomer.match.ResourceServiceReadOnly - using cached [https://github.com/plazi/treatments-rdf/archive/master.zip] at [/home/jorrit/.cache/nomer/hash/sha256/b3742bf43d9da0a8ed5522659199f47d68d31aaf46c90381190f324c1ac143f2/b176164ee1afeaab4d30171fea98c6f9aa2dc6dbbfdcbeab740f19b260e292ed.gz]
[main] WARN org.apache.jena.riot - [line: 58, col: 1 ] Bad IRI: <http://taxon-concept.plazi.org/id/Animalia/Tatargina_picta_Walker_[1865] 1864> Spaces are not legal in URIs/IRIs.
[main] WARN org.apache.jena.riot - [line: 213, col: 30] Bad IRI: <http://taxon-concept.plazi.org/id/Animalia/Tatargina_picta_Walker_[1865] 1864> Spaces are not legal in URIs/IRIs.
[main] WARN org.apache.jena.riot - [line: 24, col: 1 ] Bad IRI: <http://taxon-concept.plazi.org/id/Animalia/Aphyocharacinae]_Eigenmann_1909> Code: 0/ILLEGAL_CHARACTER in PATH: The character violates the grammar rules for URIs/IRIs.
[main] WARN org.apache.jena.riot - [line: 25, col: 22] Bad IRI: <http://taxon-name.plazi.org/id/Animalia/Aphyocharacinae]> Code: 0/ILLEGAL_CHARACTER in PATH: The character violates the grammar rules for URIs/IRIs.
[main] WARN org.apache.jena.riot - [line: 180, col: 1 ] Bad IRI: <http://taxon-name.plazi.org/id/Animalia/Aphyocharacinae]> Code: 0/ILLEGAL_CHARACTER in PATH: The character violates the grammar rules for URIs/IRIs.
[main] WARN org.apache.jena.riot - [line: 320, col: 16] Bad IRI: <http://taxon-concept.plazi.org/id/Animalia/Aphyocharacinae]_Eigenmann_1909> Code: 0/ILLEGAL_CHARACTER in PATH: The character violates the grammar rules for URIs/IRIs.
[main] WARN org.apache.jena.riot - [line: 323, col: 23] Bad IRI: <http://taxon-name.plazi.org/id/Animalia/[unassigned]_Caenogastropoda> Code: 0/ILLEGAL_CHARACTER in PATH: The character violates the grammar rules for URIs/IRIs.
[main] WARN org.apache.jena.riot - [line: 332, col: 1 ] Bad IRI: <http://taxon-name.plazi.org/id/Animalia/[unassigned]_Caenogastropoda> Code: 0/ILLEGAL_CHARACTER in PATH: The character violates the grammar rules for URIs/IRIs.
[main] WARN org.apache.jena.riot - [line: 107, col: 245] Bad IRI: <http://treatment.plazi.org/id/03D08794FFD1FFEBECE2968258F6FF38/INDEX19, SMF 358984> Spaces are not legal in URIs/IRIs.
[main] WARN org.apache.jena.riot - [line: 107, col: 341] Bad IRI: <http://treatment.plazi.org/id/03D08794FFD1FFEBECE2968258F6FF38/INDEX19, SMF 358985> Spaces are not legal in URIs/IRIs.
[main] WARN org.apache.jena.riot - [line: 107, col: 437] Bad IRI: <http://treatment.plazi.org/id/03D08794FFD1FFEBECE2968258F6FF38/INDEX19, SMF 358987> Spaces are not legal in URIs/IRIs.
[main] WARN org.apache.jena.riot - [line: 107, col: 533] Bad IRI: <http://treatment.plazi.org/id/03D08794FFD1FFEBECE2968258F6FF38/INDEX19, SMF 358988> Spaces are not legal in URIs/IRIs.
[main] WARN org.apache.jena.riot - [line: 107, col: 629] Bad IRI: <http://treatment.plazi.org/id/03D08794FFD1FFEBECE2968258F6FF38/SMF 358986> Spaces are not legal in URIs/IRIs.
[main] WARN org.apache.jena.riot - [line: 134, col: 1 ] Bad IRI: <http://treatment.plazi.org/id/03D08794FFD1FFEBECE2968258F6FF38/INDEX19, SMF 358984> Spaces are not legal in URIs/IRIs.
[main] WARN org.apache.jena.riot - [line: 142, col: 1 ] Bad IRI: <http://treatment.plazi.org/id/03D08794FFD1FFEBECE2968258F6FF38/INDEX19, SMF 358985> Spaces are not legal in URIs/IRIs.
[main] WARN org.apache.jena.riot - [line: 145, col: 25] Lexical form '' not valid for datatype XSD decimal
[main] WARN org.apache.jena.riot - [line: 146, col: 26] Lexical form '' not valid for datatype XSD decimal
[main] WARN org.apache.jena.riot - [line: 150, col: 1 ] Bad IRI: <http://treatment.plazi.org/id/03D08794FFD1FFEBECE2968258F6FF38/INDEX19, SMF 358987> Spaces are not legal in URIs/IRIs.
[main] WARN org.apache.jena.riot - [line: 158, col: 1 ] Bad IRI: <http://treatment.plazi.org/id/03D08794FFD1FFEBECE2968258F6FF38/INDEX19, SMF 358988> Spaces are not legal in URIs/IRIs.
[main] WARN org.apache.jena.riot - [line: 166, col: 1 ] Bad IRI: <http://treatment.plazi.org/id/03D08794FFD1FFEBECE2968258F6FF38/SMF 358986> Spaces are not legal in URIs/IRIs.
[main] WARN org.apache.jena.riot - [line: 169, col: 25] Lexical form '' not valid for datatype XSD decimal
[main] WARN org.apache.jena.riot - [line: 170, col: 26] Lexical form '' not valid for datatype XSD decimal
[main] WARN org.apache.jena.riot - [line: 36, col: 1 ] Bad IRI: <http://taxon-concept.plazi.org/id/Animalia/Indolestes_sp_"o"_Fraser_1922> Code: 4/UNWISE_CHARACTER in PATH: The character matches no grammar rules of URIs/IRIs. These characters are permitted in RDF URI References, XML system identifiers, and XML Schema anyURIs.
[main] WARN org.apache.jena.riot - [line: 37, col: 22] Bad IRI: <http://taxon-name.plazi.org/id/Animalia/Indolestes_sp_"o"> Code: 4/UNWISE_CHARACTER in PATH: The character matches no grammar rules of URIs/IRIs. These characters are permitted in RDF URI References, XML system identifiers, and XML Schema anyURIs.
[main] WARN org.apache.jena.riot - [line: 84, col: 1 ] Bad IRI: <http://taxon-name.plazi.org/id/Animalia/Indolestes_sp_"o"> Code: 4/UNWISE_CHARACTER in PATH: The character matches no grammar rules of URIs/IRIs. These characters are permitted in RDF URI References, XML system identifiers, and XML Schema anyURIs.
[main] WARN org.apache.jena.riot - [line: 125, col: 20] Bad IRI: <http://taxon-concept.plazi.org/id/Animalia/Indolestes_sp_"o"_Fraser_1922> Code: 4/UNWISE_CHARACTER in PATH: The character matches no grammar rules of URIs/IRIs. These characters are permitted in RDF URI References, XML system identifiers, and XML Schema anyURIs.
[main] WARN org.apache.jena.riot - [line: 115, col: 23] Bad IRI: <http://treatment.plazi.org/id/03D2AB06FFCE5139F8EC2395DCE9E30A/MHNC 13906> Spaces are not legal in URIs/IRIs.
[main] WARN org.apache.jena.riot - [line: 115, col: 105] Bad IRI: <http://treatment.plazi.org/id/03D2AB06FFCE5139F8EC2395DCE9E30A/MHNC 13947, MHNC 8270, MHNC 13933, MHNC 13935> Spaces are not legal in URIs/IRIs.
[main] WARN org.apache.jena.riot - [line: 118, col: 1 ] Bad IRI: <http://treatment.plazi.org/id/03D2AB06FFCE5139F8EC2395DCE9E30A/MHNC 13906> Spaces are not legal in URIs/IRIs.
[main] WARN org.apache.jena.riot - [line: 121, col: 25] Lexical form '' not valid for datatype XSD decimal
[main] WARN org.apache.jena.riot - [line: 122, col: 26] Lexical form '' not valid for datatype XSD decimal
[main] WARN org.apache.jena.riot - [line: 126, col: 1 ] Bad IRI: <http://treatment.plazi.org/id/03D2AB06FFCE5139F8EC2395DCE9E30A/MHNC 13947, MHNC 8270, MHNC 13933, MHNC 13935> Spaces are not legal in URIs/IRIs.
[main] WARN org.apache.jena.riot - [line: 129, col: 25] Lexical form '' not valid for datatype XSD decimal
[main] WARN org.apache.jena.riot - [line: 130, col: 26] Lexical form '' not valid for datatype XSD decimal
[main] WARN org.apache.jena.riot - [line: 132, col: 25] Lexical form '28.6160000°' not valid for datatype XSD decimal
[main] WARN org.apache.jena.riot - [line: 133, col: 26] Lexical form '032.2931667°' not valid for datatype XSD decimal
[main] WARN org.apache.jena.riot - [line: 135, col: 25] Lexical form '−21.671' not valid for datatype XSD decimal
[main] WARN org.apache.jena.riot - [line: 149, col: 25] Lexical form '−12.068' not valid for datatype XSD decimal
[main] WARN org.apache.jena.riot - [line: 159, col: 25] Lexical form '−29.550' not valid for datatype XSD decimal
[main] WARN org.apache.jena.riot - [line: 293, col: 25] Lexical form '−21,668' not valid for datatype XSD decimal
[main] WARN org.apache.jena.riot - [line: 294, col: 26] Lexical form '34,847' not valid for datatype XSD decimal
[main] INFO org.globalbioticinteractions.nomer.match.PlaziService - cache with [1451398] items built in [1321.4] s or [1098.4] items/s.
[main] INFO org.globalbioticinteractions.nomer.match.PlaziService - Indexing Plazi treatments complete.
    Homo sapiens    SAME_AS http://taxon-concept.plazi.org/id/Animalia/Homo_sapiens_Linnaeus_1758   Homo sapiens        species     Animalia | Chordata | Mammalia | Primates | Hominidae | Homo | Homo sapiens kingdom | phylum | class | order | family | genus | species     http://taxon-concept.plazi.org/id/Animalia/Homo_sapiens_Linnaeus_1758
    Homo sapiens    SAME_AS http://treatment.plazi.org/id/34AC185C73C41EA124EEED97C898FBC0  http://treatment.plazi.org/id/34AC185C73C41EA124EEED97C898FBC0              http://treatment.plazi.org/id/34AC185C73C41EA124EEED97C898FBC0              http://treatment.plazi.org/id/34AC185C73C41EA124EEED97C898FBC0
    Homo sapiens    SAME_AS doi:10.5962/bhl.title.542   doi:10.5962/bhl.title.542           doi:10.5962/bhl.title.542               https://doi.org/10.5962/bhl.title.542

Issues https://github.com/plazi/community/issues/182 and https://github.com/plazi/treatments-rdf/issues/8 were uncovered during repair of Nomer's Plazi treatment indexer.

jhpoelen commented 1 year ago

issue addressed in https://github.com/globalbioticinteractions/nomer/releases/tag/0.4.4