globalbioticinteractions / nomer

maps identifiers and names to other identifiers and names
GNU General Public License v3.0
18 stars 3 forks source link

discoverlife matchers fails to initialize #61

Closed jhpoelen closed 2 years ago

jhpoelen commented 2 years ago

When first matching again discoverlife, expect is that discoverlife is indexed, then results are shown.

However, the following is seen:

$ echo -e "\tBranta canadensis" | nomer append discoverlife
[main] INFO org.globalbioticinteractions.nomer.match.TermMatcherRegistry - using matcher [discoverlife-taxon]
[main] INFO org.globalbioticinteractions.nomer.match.DiscoverLifeTaxonService - DiscoverLife name indexing started...
[main] ERROR org.globalbioticinteractions.nomer.cmd.CmdLine - java.io.IOException: Wrong index checksum, store was not closed properly and could be corrupted.
java.io.IOError: java.io.IOException: Wrong index checksum, store was not closed properly and could be corrupted.
    at org.mapdb.StoreDirect.checkHeaders(StoreDirect.java:269)
    at org.mapdb.StoreDirect.<init>(StoreDirect.java:207)
    at org.mapdb.DBMaker.extendStoreDirect(DBMaker.java:971)
    at org.mapdb.DBMaker.makeEngine(DBMaker.java:758)
    at org.mapdb.DBMaker.make(DBMaker.java:701)
    at org.globalbioticinteractions.nomer.match.DiscoverLifeTaxonService.initCache(DiscoverLifeTaxonService.java:151)
    at org.globalbioticinteractions.nomer.match.DiscoverLifeTaxonService.lazyInit(DiscoverLifeTaxonService.java:124)
    at org.globalbioticinteractions.nomer.match.DiscoverLifeTaxonService.match(DiscoverLifeTaxonService.java:52)
    at org.eol.globi.service.TermMatcherHierarchical.match(TermMatcherHierarchical.java:57)
    at org.globalbioticinteractions.nomer.util.AppendingRowHandler.onRow(AppendingRowHandler.java:34)
    at org.globalbioticinteractions.nomer.match.MatchUtil.apply(MatchUtil.java:82)
    at org.globalbioticinteractions.nomer.match.MatchUtil.match(MatchUtil.java:35)
    at org.globalbioticinteractions.nomer.cmd.CmdAppend.run(CmdAppend.java:11)
    at org.globalbioticinteractions.nomer.cmd.CmdLine.run(CmdLine.java:18)
    at org.globalbioticinteractions.nomer.cmd.CmdLine.run(CmdLine.java:28)
    at org.globalbioticinteractions.nomer.Nomer.main(Nomer.java:15)
Caused by: java.io.IOException: Wrong index checksum, store was not closed properly and could be corrupted.
    ... 16 more

Root cause was two indexes using the same configuration, causing the index store to get confused.

jhpoelen commented 2 years ago

After applying a fix the following expected result was seen:

$ echo -e "\tBranta canadensis" | nomer append discoverlife
[main] INFO org.globalbioticinteractions.nomer.match.TermMatcherRegistry - using matcher [discoverlife-taxon]
[main] INFO org.globalbioticinteractions.nomer.match.DiscoverLifeTaxonService - DiscoverLife name indexing started...
[main] INFO org.globalbioticinteractions.nomer.match.DiscoverLifeTaxonService - [50590] DiscoverLife names were indexed in 19s (@ 2662 names/s)
    Branta canadensis   NONE        Branta canadensis                           
jorrit@lightgrey:nomer$ echo -e "\tApis mellifera" | nomer append discoverlife
[main] INFO org.globalbioticinteractions.nomer.match.TermMatcherRegistry - using matcher [discoverlife-taxon]
    Apis mellifera  HAS_ACCEPTED_NAME   https://www.discoverlife.org/mp/20q?search=Apis+mellifera   Apis mellifera  speciesAnimalia | Arthropoda | Insecta | Hymenoptera | Apidae | Apis mellifera  https://www.discoverlife.org/mp/20q?search=Animalia | https://www.discoverlife.org/mp/20q?search=Arthropoda | https://www.discoverlife.org/mp/20q?search=Insecta | https://www.discoverlife.org/mp/20q?search=Hymenoptera | https://www.discoverlife.org/mp/20q?search=Apidae | https://www.discoverlife.org/mp/20q?search=Apis+mellifera   kingdom | phylum | class | order | family | species https://www.discoverlife.org/mp/20q?search=Apis+mellifera