globalbioticinteractions / nomer

maps identifiers and names to other identifiers and names
GNU General Public License v3.0
18 stars 3 forks source link

nomer stops on parallel indexing of same resource #183

Closed jhpoelen closed 2 months ago

jhpoelen commented 3 months ago

steps to reproduce:

nomer clean
git clone https://github.com/globalbioticinteractions/taxon-graph-builder
cd taxon-graph-builder
nohup make & # 

observed error

[main] INFO org.globalbioticinteractions.nomer.match.ResourceServiceContentBased - caching [https://query.wikidata.org/sparql?format=json&query=PREFIX%20rdfs:%20%3Chttp://www.w3.org/2000/01/rdf-schema%23%3E%0APREFIX%20bd:%20%3Chttp://www.bigdata.com/rdf%23%3E%0APREFIX%20wd:%20%3Chttp://www.wikidata.org/entity/%3E%0APREFIX%20wikibase:%20%3Chttp://wikiba.se/ontology%23%3E%0APREFIX%20wdt:%20%3Chttp://www.wikidata.org/prop/direct/%3E%0ASELECT%20?i%20?l%20WHERE%20%7B%0A%20%20?i%20wdt:P31%20wd:Q427626.%0A%20%20?i%20rdfs:label%20?l%0A%7D] at [/home/jhpoelen/.cache/nomer/tmp/nomer644129068998147992.gz] done.
[main] INFO org.globalbioticinteractions.nomer.match.ResourceServiceReadOnly - using cached [https://query.wikidata.org/sparql?format=json&query=PREFIX%20rdfs:%20%3Chttp://www.w3.org/2000/01/rdf-schema%23%3E%0APREFIX%20bd:%20%3Chttp://www.bigdata.com/rdf%23%3E%0APREFIX%20wd:%20%3Chttp://www.wikidata.org/entity/%3E%0APREFIX%20wikibase:%20%3Chttp://wikiba.se/ontology%23%3E%0APREFIX%20wdt:%20%3Chttp://www.wikidata.org/prop/direct/%3E%0ASELECT%20?i%20?l%20WHERE%20%7B%0A%20%20?i%20wdt:P31%20wd:Q427626.%0A%20%20?i%20rdfs:label%20?l%0A%7D] at [/home/jhpoelen/.cache/nomer/hash/sha256/b959e969ddf4114bd590ec1cdcf7ec572076bd46e2e28e2fee038a3f6d41b9fd/ace0cedb0aa2a691e55c45bdc95dda068d4a8bb1b4086decc3f2803987984fd3.gz]
[main] INFO org.eol.globi.taxon.TaxonCacheService - local taxon cache of [file:/home/jhpoelen/.cache/nomer/wikidata_appended_taxon_ranks.tsv] building...
[main] INFO org.eol.globi.taxon.TaxonCacheService - cache with [107] items built in [0.0] s or [4115.4] items/s.
[main] INFO org.eol.globi.taxon.TaxonCacheService - local taxon cache of [file:/home/jhpoelen/.cache/nomer/wikidata_appended_taxon_ranks.tsv] built.
[main] INFO org.eol.globi.taxon.TaxonCacheService - local taxon map of [file:/home/jhpoelen/.cache/nomer/wikidata_appended_taxon_rank_links.tsv] building...
[main] INFO org.eol.globi.taxon.TaxonCacheService - cache with [4019] items built in [0.2] s or [24506.1] items/s.
[main] INFO org.eol.globi.taxon.TaxonCacheService - local taxon map of [file:/home/jhpoelen/.cache/nomer/wikidata_appended_taxon_rank_links.tsv] built.

[https://zenodo.org/recor...df104d4ba88a54972f9f49e] 4.8% of 840 kB at 1.47 MB/s ETA: < 1 minute
[https://zenodo.org/recor...df104d4ba88a54972f9f49e] 10.3% of 840 kB at 1.69 MB/s ETA: < 1 minute
[https://zenodo.org/recor...df104d4ba88a54972f9f49e] 15.9% of 840 kB at 1.84 MB/s ETA: < 1 minute
[https://zenodo.org/recor...df104d4ba88a54972f9f49e] 21.4% of 840 kB at 2.41 MB/s ETA: < 1 minute
[https://zenodo.org/recor...df104d4ba88a54972f9f49e] 26.9% of 840 kB at 2.90 MB/s ETA: < 1 minute
[https://zenodo.org/recor...df104d4ba88a54972f9f49e] 32.6% of 840 kB at 2.88 MB/s ETA: < 1 minute
[https://zenodo.org/recor...df104d4ba88a54972f9f49e] 38.3% of 840 kB at 3.27 MB/s ETA: < 1 minute
[https://zenodo.org/recor...df104d4ba88a54972f9f49e] 43.1% of 840 kB at 3.61 MB/s ETA: < 1 minute
[https://zenodo.org/recor...df104d4ba88a54972f9f49e] 48.8% of 840 kB at 4.00 MB/s ETA: < 1 minute
[https://zenodo.org/recor...df104d4ba88a54972f9f49e] 54.5% of 840 kB at 4.34 MB/s ETA: < 1 minute
[https://zenodo.org/recor...df104d4ba88a54972f9f49e] 60.2% of 840 kB at 4.29 MB/s ETA: < 1 minute
[https://zenodo.org/recor...df104d4ba88a54972f9f49e] 65.9% of 840 kB at 4.58 MB/s ETA: < 1 minute
[https://zenodo.org/recor...df104d4ba88a54972f9f49e] 71.6% of 840 kB at 4.89 MB/s ETA: < 1 minute
[https://zenodo.org/recor...df104d4ba88a54972f9f49e] 77.3% of 840 kB at 5.16 MB/s ETA: < 1 minute
[https://zenodo.org/recor...df104d4ba88a54972f9f49e] 83.0% of 840 kB at 5.45 MB/s ETA: < 1 minute
[https://zenodo.org/recor...df104d4ba88a54972f9f49e] 88.7% of 840 kB at 5.68 MB/s ETA: < 1 minute
[https://zenodo.org/recor...df104d4ba88a54972f9f49e] 94.4% of 840 kB at 5.96 MB/s ETA: < 1 minute
[https://zenodo.org/recor...df104d4ba88a54972f9f49e] 100.0% of 840 kB at 6.17 MB/s ETA: < 1 minute
[https://zenodo.org/recor...df104d4ba88a54972f9f49e] 100.0% of 840 kB at 6.17 MB/s completed in < 1 minute
[main] INFO org.globalbioticinteractions.nomer.match.ResourceServiceContentBased - caching [https://query.wikidata.org/sparql?format=json&query=PREFIX%20rdfs:%20%3Chttp://www.w3.org/2000/01/rdf-schema%23%3E%0APREFIX%20bd:%20%3Chttp://www.bigdata.com/rdf%23%3E%0APREFIX%20wd:%20%3Chttp://www.wikidata.org/entity/%3E%0APREFIX%20wikibase:%20%3Chttp://wikiba.se/ontology%23%3E%0APREFIX%20wdt:%20%3Chttp://www.wikidata.org/prop/direct/%3E%0ASELECT%20?i%20?l%20WHERE%20%7B%0A%20%20?i%20wdt:P31%20wd:Q427626.%0A%20%20?i%20rdfs:label%20?l%0A%7D] at [/home/jhpoelen/.cache/nomer/tmp/nomer708058498259317527.gz] done.
java.lang.RuntimeException: failed to create matcher
    at org.globalbioticinteractions.nomer.match.TermMatcherFactoryTaxonRanks.createTermMatcher(TermMatcherFactoryTaxonRanks.java:68)
    at org.globalbioticinteractions.nomer.match.TermMatcherRegistry.termMatcherFor(TermMatcherRegistry.java:180)
    at org.globalbioticinteractions.nomer.match.MatchUtil.lambda$resolveMatcher$0(MatchUtil.java:58)
    at java.util.stream.ReferencePipeline$3$1.accept(ReferencePipeline.java:193)
    at java.util.ArrayList$ArrayListSpliterator.tryAdvance(ArrayList.java:1361)
    at java.util.stream.ReferencePipeline.forEachWithCancel(ReferencePipeline.java:126)
    at java.util.stream.AbstractPipeline.copyIntoWithCancel(AbstractPipeline.java:499)
    at java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:486)
    at java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:472)
    at java.util.stream.FindOps$FindOp.evaluateSequential(FindOps.java:152)
    at java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234)
    at java.util.stream.ReferencePipeline.findFirst(ReferencePipeline.java:531)
    at org.globalbioticinteractions.nomer.match.MatchUtil.resolveMatcher(MatchUtil.java:63)
    at org.globalbioticinteractions.nomer.match.MatchUtil.getTermMatcher(MatchUtil.java:50)
    at org.globalbioticinteractions.nomer.cmd.CmdReplace.run(CmdReplace.java:21)
    at picocli.CommandLine.executeUserObject(CommandLine.java:1939)
    at picocli.CommandLine.access$1300(CommandLine.java:145)
    at picocli.CommandLine$RunLast.executeUserObjectOfLastSubcommandWithSameParent(CommandLine.java:2358)
    at picocli.CommandLine$RunLast.handle(CommandLine.java:2352)
    at picocli.CommandLine$RunLast.handle(CommandLine.java:2314)
    at picocli.CommandLine$AbstractParseResultHandler.execute(CommandLine.java:2179)
    at picocli.CommandLine$RunLast.execute(CommandLine.java:2316)
    at picocli.CommandLine.execute(CommandLine.java:2078)
    at org.globalbioticinteractions.nomer.Nomer.run(Nomer.java:57)
    at org.globalbioticinteractions.nomer.Nomer.main(Nomer.java:46)
Caused by: java.io.IOException: failed to access [https://query.wikidata.org/sparql?format=json&query=PREFIX%20rdfs:%20%3Chttp://www.w3.org/2000/01/rdf-schema%23%3E%0APREFIX%20bd:%20%3Chttp://www.bigdata.com/rdf%23%3E%0APREFIX%20wd:%20%3Chttp://www.wikidata.org/entity/%3E%0APREFIX%20wikibase:%20%3Chttp://wikiba.se/ontology%23%3E%0APREFIX%20wdt:%20%3Chttp://www.wikidata.org/prop/direct/%3E%0ASELECT%20?i%20?l%20WHERE%20%7B%0A%20%20?i%20wdt:P31%20wd:Q427626.%0A%20%20?i%20rdfs:label%20?l%0A%7D] in preston verse [
    at org.globalbioticinteractions.nomer.match.ResourceServiceContentBased.retrieve(ResourceServiceContentBased.java:81)
    at org.globalbioticinteractions.nomer.match.ResourceServiceFactoryImpl$1.retrieve(ResourceServiceFactoryImpl.java:37)
    at org.globalbioticinteractions.nomer.match.TermMatcherContextCaching.retrieve(TermMatcherContextCaching.java:19)
    at org.globalbioticinteractions.nomer.match.WikidataTaxonRankLoader.importTaxonRanks(WikidataTaxonRankLoader.java:47)
    at org.globalbioticinteractions.nomer.match.TermMatcherFactoryTaxonRanks.createTermMatcher(TermMatcherFactoryTaxonRanks.java:54)
    ... 24 more
Caused by: org.apache.commons.io.FileExistsException: File element in parameter 'destFile' already exists: '/home/jhpoelen/.cache/nomer/hash/sha256/b959e969ddf4114bd590ec1cdcf7ec572076bd46e2e28e2fee038a3f6d41b9fd/ace0cedb0aa2a691e55c45bdc95dda068d4a8bb1b4086decc3f2803987984fd3.gz'
    at org.apache.commons.io.FileUtils.requireAbsent(FileUtils.java:2688)
    at org.apache.commons.io.FileUtils.moveFile(FileUtils.java:2398)
    at org.apache.commons.io.FileUtils.moveFile(FileUtils.java:2376)
    at org.globalbioticinteractions.nomer.match.ResourceServiceContentBased.retrieve(ResourceServiceContentBased.java:79)
    ... 28 more
make: *** [Makefile:112: target/taxonCache.tsv.gz] Error 1
jhpoelen commented 3 months ago

Suspected root cause is that two instances of nomer try to build the same index, then trip over each other when storing an offline copy of the index source data.

jhpoelen commented 2 months ago

fixed in next Nomer release