globalbioticinteractions / nomer

maps identifiers and names to other identifiers and names
GNU General Public License v3.0
18 stars 3 forks source link

Multiple Author Retrieval #110

Closed jtmiller28 closed 1 year ago

jtmiller28 commented 2 years ago

Currently I am trying to create example code to process plant names through Nomer creating a table output with each name and their current taxonomic status with listed authorship. This goal is accurately represented by the second comment in issue #58 , where the authorship, and taxonomic status is matched per name.

While trying to recreate that issue's table I ran into the following java exception when running:

nomer list --properties my.properties discoverlife | head

java.lang.IllegalArgumentException: URI has an authority component at java.base/java.io.File.(File.java:425) at org.eol.globi.util.ResourceServiceLocalFile.retrieve(ResourceServiceLocalFile.java:21) at org.globalbioticinteractions.nomer.cmd.CmdDefaultParams$1$1.retrieve(CmdDefaultParams.java:65) at org.eol.globi.util.ResourceServiceGzipAware.retrieve(ResourceServiceGzipAware.java:20) at org.globalbioticinteractions.nomer.cmd.CmdDefaultParams.initProperties(CmdDefaultParams.java:84) at org.globalbioticinteractions.nomer.cmd.CmdDefaultParams.getProperties(CmdDefaultParams.java:38) at org.globalbioticinteractions.nomer.cmd.CmdDefaultParams.getProperty(CmdDefaultParams.java:33) at org.globalbioticinteractions.nomer.match.CatalogueOfLifeTaxonService.(CatalogueOfLifeTaxonService.java:39) at org.eol.globi.service.TermMatchEnsembleFactory$2.(TermMatchEnsembleFactory.java:60) at org.eol.globi.service.TermMatchEnsembleFactory.getEnrichers(TermMatchEnsembleFactory.java:39) at org.globalbioticinteractions.nomer.match.TermMatcherFactoryEnricherFactory$1.(TermMatcherFactoryEnricherFactory.java:23) at org.globalbioticinteractions.nomer.match.TermMatcherFactoryEnricherFactory.createTermMatchFactories(TermMatcherFactoryEnricherFactory.java:22) at org.globalbioticinteractions.nomer.match.TermMatcherRegistry.getRegistry(TermMatcherRegistry.java:135) at org.globalbioticinteractions.nomer.match.TermMatcherRegistry.termMatcherFor(TermMatcherRegistry.java:152) at org.globalbioticinteractions.nomer.match.MatchUtil.lambda$resolveMatcher$0(MatchUtil.java:54) at java.base/java.util.stream.ReferencePipeline$3$1.accept(ReferencePipeline.java:195) at java.base/java.util.ArrayList$ArrayListSpliterator.tryAdvance(ArrayList.java:1632) at java.base/java.util.stream.ReferencePipeline.forEachWithCancel(ReferencePipeline.java:127) at java.base/java.util.stream.AbstractPipeline.copyIntoWithCancel(AbstractPipeline.java:502) at java.base/java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:488) at java.base/java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:474) at java.base/java.util.stream.FindOps$FindOp.evaluateSequential(FindOps.java:150) at java.base/java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234) at java.base/java.util.stream.ReferencePipeline.findFirst(ReferencePipeline.java:543) at org.globalbioticinteractions.nomer.match.MatchUtil.resolveMatcher(MatchUtil.java:59) at org.globalbioticinteractions.nomer.match.MatchUtil.getTermMatcher(MatchUtil.java:46) at org.globalbioticinteractions.nomer.match.MatchUtil.getAppendingRowHandler(MatchUtil.java:91) at org.globalbioticinteractions.nomer.match.MatchUtil.getAppendingRowHandlers(MatchUtil.java:115) at org.globalbioticinteractions.nomer.cmd.CmdDump.run(CmdDump.java:19) at picocli.CommandLine.executeUserObject(CommandLine.java:1939) at picocli.CommandLine.access$1300(CommandLine.java:145) at picocli.CommandLine$RunLast.executeUserObjectOfLastSubcommandWithSameParent(CommandLine.java:2358) at picocli.CommandLine$RunLast.handle(CommandLine.java:2352) at picocli.CommandLine$RunLast.handle(CommandLine.java:2314) at picocli.CommandLine$AbstractParseResultHandler.execute(CommandLine.java:2179) at picocli.CommandLine$RunLast.execute(CommandLine.java:2316) at picocli.CommandLine.execute(CommandLine.java:2078) at org.globalbioticinteractions.nomer.Nomer.run(Nomer.java:57) at org.globalbioticinteractions.nomer.Nomer.main(Nomer.java:46)

my.properties were downloaded from the issue: my.properties.gz

java -version openjdk version "11.0.16" 2022-07-19 OpenJDK Runtime Environment (build 11.0.16+8-post-Ubuntu-0ubuntu122.04) OpenJDK 64-Bit Server VM (build 11.0.16+8-post-Ubuntu-0ubuntu122.04, mixed mode, sharing)

Also a bit of a follow up, can --properties be used with append? Currently I plan to construct this list to identify possible multiple accepted name mappings dependent on just authorship rather than the name itself, but if possible letting the user submit a name list to nomer and returning all possible name designations based upon authorship might be ideal. This may be important since some large lists (vascular plants) may comprise different accepted names based upon the authors species concept.

jhpoelen commented 2 years ago

@jtmiller28 thanks for sharing your issue

As far as I can see, your nomer properties -

nomer.append.schema.output.example.taxon.rank.order=[{"column":0,"type":"path.order.id"},{"column": 1,"type":"path.order.name"},{"column": 2,"type":"path.order"}]
nomer.append.schema.output=[{"column":0,"type":"externalId"},{"column": 1,"type":"name"},{"column": 2,"type":"authorship"},{"column":3,"type":"rank"}]
nomer.schema.input=[{"column":0,"type":"externalId"},{"column": 1,"type":"name"},{"column": 2,"type":"authorship"},{"column": 3, "type":"rank"}]
jhpoelen commented 2 years ago

@jtmiller28 which version of nomer are you using?

jhpoelen commented 2 years ago

also, what does:

nomer list discoverlife | head 

produce for you?

jtmiller28 commented 2 years ago

nomer version 0.2.15

jtmiller28 commented 2 years ago

nomer list discoverlife | head [main] INFO org.globalbioticinteractions.nomer.match.TermMatcherRegistry - using matcher [discoverlife-taxon] [main] INFO org.globalbioticinteractions.nomer.match.DiscoverLifeTaxonService - DiscoverLife name indexing started... WARNING: An illegal reflective access operation has occurred WARNING: Illegal reflective access by org.mapdb.SerializerPojo$FieldInfo (file:/usr/local/bin/nomer) to field java.util.Collections$UnmodifiableMap.m WARNING: Please consider reporting this to the maintainers of org.mapdb.SerializerPojo$FieldInfo WARNING: Use --illegal-access=warn to enable warnings of further illegal reflective access operations WARNING: All illegal access operations will be denied in a future release [main] INFO org.globalbioticinteractions.nomer.match.DiscoverLifeTaxonService - [50219] DiscoverLife names were indexed in 10s (@ 5021 names/s) https://www.discoverlife.org/mp/20q?search=Acamptopoeum+argentinum Acamptopoeum argentinum HAS_ACCEPTED_NAME https://www.discoverlife.org/mp/20q?search=Acamptopoeum+argentinum Acamptopoeum argentinum species Animalia | Arthropoda | Insecta | Hymenoptera | Andrenidae | Acamptopoeum argentinum https://www.discoverlife.org/mp/20q?search=Animalia | https://www.discoverlife.org/mp/20q?search=Arthropoda | https://www.discoverlife.org/mp/20q?search=Insecta | https://www.discoverlife.org/mp/20q?search=Hymenoptera | https://www.discoverlife.org/mp/20q?search=Andrenidae | https://www.discoverlife.org/mp/20q?search=Acamptopoeum+argentinum kingdom | phylum | class | order | family | species https://www.discoverlife.org/mp/20q?search=Acamptopoeum+argentinum
https://www.discoverlife.org/mp/20q?search=Acamptopoeum+calchaqui Acamptopoeum calchaqui HAS_ACCEPTED_NAME https://www.discoverlife.org/mp/20q?search=Acamptopoeum+calchaqui Acamptopoeum calchaqui species Animalia | Arthropoda | Insecta | Hymenoptera | Andrenidae | Acamptopoeum calchaqui https://www.discoverlife.org/mp/20q?search=Animalia | https://www.discoverlife.org/mp/20q?search=Arthropoda | https://www.discoverlife.org/mp/20q?search=Insecta | https://www.discoverlife.org/mp/20q?search=Hymenoptera | https://www.discoverlife.org/mp/20q?search=Andrenidae | https://www.discoverlife.org/mp/20q?search=Acamptopoeum+calchaqui kingdom | phylum | class | order | family | species https://www.discoverlife.org/mp/20q?search=Acamptopoeum+calchaqui
https://www.discoverlife.org/mp/20q?search=Acamptopoeum+colombiense Acamptopoeum colombiense HAS_ACCEPTED_NAME https://www.discoverlife.org/mp/20q?search=Acamptopoeum+colombiense Acamptopoeum colombiense speciesAnimalia | Arthropoda | Insecta | Hymenoptera | Andrenidae | Acamptopoeum colombiense https://www.discoverlife.org/mp/20q?search=Animalia | https://www.discoverlife.org/mp/20q?search=Arthropoda | https://www.discoverlife.org/mp/20q?search=Insecta | https://www.discoverlife.org/mp/20q?search=Hymenoptera | https://www.discoverlife.org/mp/20q?search=Andrenidae | https://www.discoverlife.org/mp/20q?search=Acamptopoeum+colombiense kingdom | phylum | class | order | family | species https://www.discoverlife.org/mp/20q?search=Acamptopoeum+colombiense https://www.discoverlife.org/mp/20q?search=Acamptopoeum+colombiensis Acamptopoeum colombiensis SYNONYM_OF https://www.discoverlife.org/mp/20q?search=Acamptopoeum+colombiense Acamptopoeum colombiense species Animalia | Arthropoda | Insecta | Hymenoptera | Andrenidae | Acamptopoeum colombiense https://www.discoverlife.org/mp/20q?search=Animalia | https://www.discoverlife.org/mp/20q?search=Arthropoda | https://www.discoverlife.org/mp/20q?search=Insecta | https://www.discoverlife.org/mp/20q?search=Hymenoptera | https://www.discoverlife.org/mp/20q?search=Andrenidae | https://www.discoverlife.org/mp/20q?search=Acamptopoeum+colombiense kingdom | phylum | class | order | family | species https://www.discoverlife.org/mp/20q?search=Acamptopoeum+colombiense https://www.discoverlife.org/mp/20q?search=Acamptopoeum+fernandezi Acamptopoeum fernandezi HAS_ACCEPTED_NAME https://www.discoverlife.org/mp/20q?search=Acamptopoeum+fernandezi Acamptopoeum fernandezi species Animalia | Arthropoda | Insecta | Hymenoptera | Andrenidae | Acamptopoeum fernandezi https://www.discoverlife.org/mp/20q?search=Animalia | https://www.discoverlife.org/mp/20q?search=Arthropoda | https://www.discoverlife.org/mp/20q?search=Insecta | https://www.discoverlife.org/mp/20q?search=Hymenoptera | https://www.discoverlife.org/mp/20q?search=Andrenidae | https://www.discoverlife.org/mp/20q?search=Acamptopoeum+fernandezi kingdom | phylum | class | order | family | species https://www.discoverlife.org/mp/20q?search=Acamptopoeum+fernandezi
https://www.discoverlife.org/mp/20q?search=Acamptopoeum+inauratum Acamptopoeum inauratum HAS_ACCEPTED_NAME https://www.discoverlife.org/mp/20q?search=Acamptopoeum+inauratum Acamptopoeum inauratum species Animalia | Arthropoda | Insecta | Hymenoptera | Andrenidae | Acamptopoeum inauratum https://www.discoverlife.org/mp/20q?search=Animalia | https://www.discoverlife.org/mp/20q?search=Arthropoda | https://www.discoverlife.org/mp/20q?search=Insecta | https://www.discoverlife.org/mp/20q?search=Hymenoptera | https://www.discoverlife.org/mp/20q?search=Andrenidae | https://www.discoverlife.org/mp/20q?search=Acamptopoeum+inauratum kingdom | phylum | class | order | family | species https://www.discoverlife.org/mp/20q?search=Acamptopoeum+inauratum
https://www.discoverlife.org/mp/20q?search=Acamptopoeum+melanogaster Acamptopoeum melanogaster HAS_ACCEPTED_NAME https://www.discoverlife.org/mp/20q?search=Acamptopoeum+melanogaster Acamptopoeum melanogaster speciesAnimalia | Arthropoda | Insecta | Hymenoptera | Andrenidae | Acamptopoeum melanogaster https://www.discoverlife.org/mp/20q?search=Animalia | https://www.discoverlife.org/mp/20q?search=Arthropoda | https://www.discoverlife.org/mp/20q?search=Insecta | https://www.discoverlife.org/mp/20q?search=Hymenoptera | https://www.discoverlife.org/mp/20q?search=Andrenidae | https://www.discoverlife.org/mp/20q?search=Acamptopoeum+melanogaster kingdom | phylum | class | order | family | species https://www.discoverlife.org/mp/20q?search=Acamptopoeum+melanogaster
https://www.discoverlife.org/mp/20q?search=Acamptopoeum+nigritarse Acamptopoeum nigritarse HAS_ACCEPTED_NAME https://www.discoverlife.org/mp/20q?search=Acamptopoeum+nigritarse Acamptopoeum nigritarse species Animalia | Arthropoda | Insecta | Hymenoptera | Andrenidae | Acamptopoeum nigritarse https://www.discoverlife.org/mp/20q?search=Animalia | https://www.discoverlife.org/mp/20q?search=Arthropoda | https://www.discoverlife.org/mp/20q?search=Insecta | https://www.discoverlife.org/mp/20q?search=Hymenoptera | https://www.discoverlife.org/mp/20q?search=Andrenidae | https://www.discoverlife.org/mp/20q?search=Acamptopoeum+nigritarse kingdom | phylum | class | order | family | species https://www.discoverlife.org/mp/20q?search=Acamptopoeum+nigritarse
https://www.discoverlife.org/mp/20q?search=Acamptopoeum+prinii Acamptopoeum prinii HAS_ACCEPTED_NAME https://www.discoverlife.org/mp/20q?search=Acamptopoeum+prinii Acamptopoeum prinii species Animalia | Arthropoda | Insecta | Hymenoptera | Andrenidae | Acamptopoeum prinii https://www.discoverlife.org/mp/20q?search=Animalia | https://www.discoverlife.org/mp/20q?search=Arthropoda | https://www.discoverlife.org/mp/20q?search=Insecta | https://www.discoverlife.org/mp/20q?search=Hymenoptera | https://www.discoverlife.org/mp/20q?search=Andrenidae | https://www.discoverlife.org/mp/20q?search=Acamptopoeum+prinii kingdom | phylum | class | order | family | species https://www.discoverlife.org/mp/20q?search=Acamptopoeum+prinii
https://www.discoverlife.org/mp/20q?search=Acamptopoeum+submetallicum Acamptopoeum submetallicum HAS_ACCEPTED_NAME https://www.discoverlife.org/mp/20q?search=Acamptopoeum+submetallicum Acamptopoeum submetallicum speciesAnimalia | Arthropoda | Insecta | Hymenoptera | Andrenidae | Acamptopoeum submetallicum https://www.discoverlife.org/mp/20q?search=Animalia | https://www.discoverlife.org/mp/20q?search=Arthropoda | https://www.discoverlife.org/mp/20q?search=Insecta | https://www.discoverlife.org/mp/20q?search=Hymenoptera | https://www.discoverlife.org/mp/20q?search=Andrenidae | https://www.discoverlife.org/mp/20q?search=Acamptopoeum+submetallicum kingdom | phylum | class | order | family | species https://www.discoverlife.org/mp/20q?search=Acamptopoeum+submetallicum

jhpoelen commented 2 years ago

@jtmiller28 thanks.

I was able to reproduce your issue -

$ nomer list --properties my.properties discoverlife | head 
java.lang.IllegalArgumentException: URI has an authority component
    at java.io.File.<init>(File.java:425)
    at org.eol.globi.util.ResourceServiceLocalFile.retrieve(ResourceServiceLocalFile.java:21)
    at org.globalbioticinteractions.nomer.cmd.CmdDefaultParams$1$1.retrieve(CmdDefaultParams.java:65)
    at org.eol.globi.util.ResourceServiceGzipAware.retrieve(ResourceServiceGzipAware.java:20)
    at org.globalbioticinteractions.nomer.cmd.CmdDefaultParams.initProperties(CmdDefaultParams.java:84)
    at org.globalbioticinteractions.nomer.cmd.CmdDefaultParams.getProperties(CmdDefaultParams.java:38)
    at org.globalbioticinteractions.nomer.cmd.CmdDefaultParams.getProperty(CmdDefaultParams.java:33)
    at org.globalbioticinteractions.nomer.match.CatalogueOfLifeTaxonService.<init>(CatalogueOfLifeTaxonService.java:39)
    at org.eol.globi.service.TermMatchEnsembleFactory$2.<init>(TermMatchEnsembleFactory.java:60)
    at org.eol.globi.service.TermMatchEnsembleFactory.getEnrichers(TermMatchEnsembleFactory.java:39)
    at org.globalbioticinteractions.nomer.match.TermMatcherFactoryEnricherFactory$1.<init>(TermMatcherFactoryEnricherFactory.java:23)
jhpoelen commented 2 years ago

@jtmiller28 thanks for sharing the details.

I was able to reproduce the results. The root cause is that Nomer expects the properties file to be a URI

for now, I'd recommend a workaround like, in which you use a file url with absolute path (via $PWD in example below) -

$ echo -e "\tAndrena clypella" | nomer append --properties file://$PWD/my.properties discoverlife
[main] INFO org.globalbioticinteractions.nomer.match.TermMatcherRegistry - using matcher [discoverlife-taxon]
    Andrena clypella    HAS_ACCEPTED_NAME   https://www.discoverlife.org/mp/20q?search=Andrena+clypella Andrena clypella    Strand, 1921    species

In the future, I am hoping to support local files without having to construct these elaborate absolute file URLs.

Please let me know if this unblocks you for now. If not, please holler and I can expedite a fix.

jtmiller28 commented 2 years ago

That indeed does the trick. I'll use this to reference naming is DiscoverLife & World Flora Online. Thanks!

jhpoelen commented 1 year ago

oops - accidentally closed. Applying "workaround exists" label instead.

jhpoelen commented 1 year ago

@jtmiller28 I've updated the code that loads the properties. Now, you can use relative file paths in addition to URL, and you no longer need the proposed workaround.

For instance:

$ echo -e "\tYucca brevifolia\tL.D.Benson & Darrox" | nomer append wfo --properties my.properties
[main] INFO org.globalbioticinteractions.nomer.match.TermMatcherRegistry - using matcher [wfo]
[main] INFO org.globalbioticinteractions.nomer.match.WorldOfFloraOnlineTaxonService - [WORLD_OF_FLORA_ONLINE] taxonomy already indexed at [/home/jorrit/.cache/nomer/world_of_flora_online/world_of_flora_online], no need to import.
    Yucca brevifolia    L.D.Benson & Darrox HAS_ACCEPTED_NAME   WFO:0000752275  Yucca brevifolia    Engelm. species
    Yucca brevifolia    L.D.Benson & Darrox SYNONYM_OF  WFO:0000753634  Yucca baccata var. brevifolia   L.D.Benson & Darrow variety