globalbioticinteractions / nomer

maps identifiers and names to other identifiers and names
GNU General Public License v3.0
19 stars 3 forks source link

unexpected error when using two nomer processes with one cache directory #19

Closed jhpoelen closed 4 years ago

jhpoelen commented 4 years ago

Nomer uses internal index/caches (in .nomer directory) to enable fast offline term matching.

On using two Nomer instances on a non-indexed system, the exception (java.io.IOError: java.io.IOException: Wrong index checksum, store was not closed properly and could be corrupted.) was observed.

Root cause was a new index created by the first Nomer instance with un-commited changes was being re-use by a second Nomer instance.

Suggested fix is to explicitly commit changes after created a new index and to prevent overwriting of existing (partial) indexes.

$ zcat target/taxonCacheNoHeader.tsv.gz | tail -n +2 | cut -f3 | awk -F '\t' '{ print $1 "\t" $1 }' | java -jar target/nomer.jar replace --properties=config/name2id.properties globi-taxon-rank | cut -f1 | java -jar target/nomer.jar replace --properties=config/id2name.properties globi-taxon-rank | pv -l > testing123
using matcher [globi-taxon-rank]
using matcher [globi-taxon-rank]
Invalid cookie header: "Set-Cookie: WMF-Last-Access=20-May-2020;Path=/;HttpOnly;secure;Expires=Sun, 21 Jun 2020 12:00:00 GMT". Invalid 'expires' attribute: Sun, 21 Jun 2020 12:00:00 GMT
Invalid cookie header: "Set-Cookie: WMF-Last-Access-Global=20-May-2020;Path=/;Domain=.wikidata.org;HttpOnly;secure;Expires=Sun, 21 Jun 2020 12:00:00 GMT". Invalid 'expires' attribute: Sun, 21 Jun 2020 12:00:00 GMT
Invalid cookie header: "Set-Cookie: WMF-Last-Access=20-May-2020;Path=/;HttpOnly;secure;Expires=Sun, 21 Jun 2020 12:00:00 GMT". Invalid 'expires' attribute: Sun, 21 Jun 2020 12:00:00 GMT
Invalid cookie header: "Set-Cookie: WMF-Last-Access-Global=20-May-2020;Path=/;Domain=.wikidata.org;HttpOnly;secure;Expires=Sun, 21 Jun 2020 12:00:00 GMT". Invalid 'expires' attribute: Sun, 21 Jun 2020 12:00:00 GMT
local taxon cache of [file:/home/jhpoelen/taxon-graph-builder/./.nomer/wikidata_appended_taxon_ranks.tsv] building...
cache with [100] items built in [0.0] s or [5000.0] items/s.
local taxon cache of [file:/home/jhpoelen/taxon-graph-builder/./.nomer/wikidata_appended_taxon_ranks.tsv] built.
local taxon map of [file:/home/jhpoelen/taxon-graph-builder/./.nomer/wikidata_appended_taxon_rank_links.tsv] building...
cache with [7215] items built in [0.1] s or [49417.8] items/s.
local taxon map of [file:/home/jhpoelen/taxon-graph-builder/./.nomer/wikidata_appended_taxon_rank_links.tsv] built.
unexpected exception/s] [<=>                                                   ]
java.io.IOError: java.io.IOException: Wrong index checksum, store was not closed properly and could be corrupted.
    at org.mapdb.StoreDirect.checkHeaders(StoreDirect.java:269)
    at org.mapdb.StoreDirect.<init>(StoreDirect.java:207)
    at org.mapdb.DBMaker.extendStoreDirect(DBMaker.java:971)
    at org.mapdb.DBMaker.makeEngine(DBMaker.java:758)
    at org.mapdb.DBMaker.make(DBMaker.java:701)
    at org.eol.globi.service.CacheService.initDb(CacheService.java:35)
    at org.eol.globi.taxon.TaxonCacheService.initTaxonCache(TaxonCacheService.java:166)
    at org.eol.globi.taxon.TaxonCacheService.init(TaxonCacheService.java:121)
    at org.eol.globi.taxon.TaxonCacheService.lazyInit(TaxonCacheService.java:116)
    at org.eol.globi.taxon.TaxonCacheService.match(TaxonCacheService.java:202)
    at org.eol.globi.service.TermMatcherHierarchical.match(TermMatcherHierarchical.java:57)
    at org.globalbioticinteractions.nomer.util.ReplacingRowHandler.onRow(ReplacingRowHandler.java:101)
    at org.globalbioticinteractions.nomer.util.MatchUtil.apply(MatchUtil.java:52)
    at org.globalbioticinteractions.nomer.util.MatchUtil.match(MatchUtil.java:26)
    at org.globalbioticinteractions.nomer.cmd.CmdReplace.run(CmdReplace.java:17)
    at org.globalbioticinteractions.nomer.cmd.CmdLine.run(CmdLine.java:18)
    at org.globalbioticinteractions.nomer.cmd.CmdLine.run(CmdLine.java:27)
    at org.globalbioticinteractions.nomer.Nomer.main(Nomer.java:15)
Caused by: java.io.IOException: Wrong index checksum, store was not closed properly and could be corrupted.
    ... 18 more
jhpoelen commented 4 years ago

Appears to be fixed after upgrade to Nomer v0.1.11 - the following command completed without errors:

$ zcat target/taxonCacheNoHeader.tsv.gz | tail -n +2 | cut -f3 | awk -F '\t' '{ print $1 "\t" $1 }' | java -jar target/nomer.jar replace --properties=config/name2id.properties globi-taxon-rank | cut -f1 | java -jar target/nomer.jar replace --properties=config/id2name.properties globi-taxon-rank > target/norm_ranks.tsv
using matcher [globi-taxon-rank]
using matcher [globi-taxon-rank]
Invalid cookie header: "Set-Cookie: WMF-Last-Access=20-May-2020;Path=/;HttpOnly;secure;Expires=Sun, 21 Jun 2020 12:00:00 GMT". Invalid 'expires' attribute: Sun, 21 Jun 2020 12:00:00 GMT
Invalid cookie header: "Set-Cookie: WMF-Last-Access-Global=20-May-2020;Path=/;Domain=.wikidata.org;HttpOnly;secure;Expires=Sun, 21 Jun 2020 12:00:00 GMT". Invalid 'expires' attribute: Sun, 21 Jun 2020 12:00:00 GMT
Invalid cookie header: "Set-Cookie: WMF-Last-Access=20-May-2020;Path=/;HttpOnly;secure;Expires=Sun, 21 Jun 2020 12:00:00 GMT". Invalid 'expires' attribute: Sun, 21 Jun 2020 12:00:00 GMT
Invalid cookie header: "Set-Cookie: WMF-Last-Access-Global=20-May-2020;Path=/;Domain=.wikidata.org;HttpOnly;secure;Expires=Sun, 21 Jun 2020 12:00:00 GMT". Invalid 'expires' attribute: Sun, 21 Jun 2020 12:00:00 GMT