Open nickdos opened 2 years ago
See https://github.com/ARGA-Genomes/arga-data/issues/11#issuecomment-1241400786
For now decided to not follow this path.
I've made a second attempt at this using both the full GBIF backbone (#24) and also using just the NCBI taxonomy, taken from checklistbank.org. GBIF backbone failed due to simply being too big to load, as the NMI keeps the entire tree in memory and the GBIF backbone is almost 10x larger. The NCBI taxonomy failed after 12+ hours with a circular loop reference error, meaning 2 taxa pointed back to each other via parent IDs in some way. Seeing as it took 12+ hours to find the first circular reference and there could be dozens of these, I'm thinking its not a good use of my time running this continually for days. It is possible to exclude names via config but you have to know what they are before hand.
Planning on re-run the merge of NCBI with the major ALA sources, using nectar-arga-dev-2.ala.org.au which has 32GB of memory. Will need to exclude the loop taxon first.
UPDATE 1: started run on nectar-arga-dev-2
- using screen
tool
UPDATE 2: Errored with same message, so config is not right - checking with Doug on what I did wrong.
Failed again with error:
ERROR: [ScientificName] - Unable to find principal for SN[no code, TROCHIDAE, unranked]
ERROR: [TaxonomyBuilder] - Unable to combine taxa
java.lang.NullPointerException
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
at java.util.concurrent.ForkJoinTask.getThrowableException(ForkJoinTask.java:598)
at java.util.concurrent.ForkJoinTask.reportException(ForkJoinTask.java:677)
at java.util.concurrent.ForkJoinTask.invoke(ForkJoinTask.java:735)
at java.util.stream.ForEachOps$ForEachOp.evaluateParallel(ForEachOps.java:159)
at java.util.stream.ForEachOps$ForEachOp$OfRef.evaluateParallel(ForEachOps.java:173)
at java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:233)
at java.util.stream.ReferencePipeline.forEach(ReferencePipeline.java:485)
at java.util.stream.ReferencePipeline$Head.forEach(ReferencePipeline.java:650)
at au.org.ala.names.index.Taxonomy.resolvePrincipal(Taxonomy.java:728)
at au.org.ala.names.index.Taxonomy.resolve(Taxonomy.java:441)
at au.org.ala.names.index.TaxonomyBuilder.main(TaxonomyBuilder.java:151)
Caused by: java.lang.NullPointerException
at au.org.ala.names.index.TaxonConceptInstance.getResolvedAccepted(TaxonConceptInstance.java:1084)
at au.org.ala.names.index.TaxonConceptInstance.getResolvedAccepted(TaxonConceptInstance.java:1019)
at au.org.ala.names.index.ScientificName.findPrincipal(ScientificName.java:129)
at au.org.ala.names.index.ScientificName.findPrincipal(ScientificName.java:53)
at au.org.ala.names.index.Name.resolvePrincipal(Name.java:251)
at au.org.ala.names.index.Taxonomy.lambda$resolvePrincipal$42(Taxonomy.java:728)
at java.util.stream.ForEachOps$ForEachOp$OfRef.accept(ForEachOps.java:183)
at java.util.HashMap$ValueSpliterator.forEachRemaining(HashMap.java:1652)
at java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:482)
at java.util.stream.ForEachOps$ForEachTask.compute(ForEachOps.java:290)
at java.util.concurrent.CountedCompleter.exec(CountedCompleter.java:731)
at java.util.concurrent.ForkJoinTask.doExec(ForkJoinTask.java:289)
at java.util.concurrent.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1056)
at java.util.concurrent.ForkJoinPool.runWorker(ForkJoinPool.java:1692)
at java.util.concurrent.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:175)
null
Checking with Doug on how to avoid this.
Doug provided the following hints:
Notes LTC = large taxon collider NMS = name matching service
Further comments: