geneontology / minerva

BSD 3-Clause "New" or "Revised" License
6 stars 8 forks source link

parallel model importing may not be safe #432

Open balhoff opened 2 years ago

balhoff commented 2 years ago

Currently model import uses parallel streams. I'm not sure how well we've architected the minerva code to support this. Only one can be inserted into the database at a time, anyway. Speculating that this may relate to a failure seen in the pipeline by @kltm:

01:18:30  skipping 568b0f9600000280.ttl
01:18:37  java.lang.ClassCastException
01:18:37    at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
01:18:37    at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
01:18:37    at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
01:18:37    at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
01:18:37    at java.util.concurrent.ForkJoinTask.getThrowableException(ForkJoinTask.java:598)
01:18:37    at java.util.concurrent.ForkJoinTask.reportException(ForkJoinTask.java:677)
01:18:37    at java.util.concurrent.ForkJoinTask.invoke(ForkJoinTask.java:735)
01:18:37    at java.util.stream.ForEachOps$ForEachOp.evaluateParallel(ForEachOps.java:159)
01:18:37    at java.util.stream.ForEachOps$ForEachOp$OfRef.evaluateParallel(ForEachOps.java:173)
01:18:37    at java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:233)
01:18:37    at java.util.stream.ReferencePipeline.forEach(ReferencePipeline.java:485)
01:18:37    at java.util.stream.ReferencePipeline$Head.forEach(ReferencePipeline.java:650)
01:18:37    at org.geneontology.minerva.cli.CommandLineInterface.importOWLModels(CommandLineInterface.java:473)
01:18:37    at org.geneontology.minerva.cli.CommandLineInterface.main(CommandLineInterface.java:270)
01:18:37  Caused by: java.lang.ClassCastException: java.util.HashMap$Node cannot be cast to java.util.HashMap$TreeNode
01:18:37    at java.util.HashMap$TreeNode.moveRootToFront(HashMap.java:1835)
kltm commented 2 years ago

Noting that this section, while it failed, seems to be passing on a retry. Not sure of the expected frequency of failure, but this has likely been in this state for some time.

kltm commented 2 years ago

@balhoff Do you feel that this has gotten state? Would it be fine to close this out for now, or do you still have an iron in the fire for this?

kltm commented 2 years ago

I'm going to vote "stale" for now.

kltm commented 1 year ago

@balhoff I may have hit this again today. Or something related:

skipping 5b91dbd100000065.ttl
skipping MGI_MGI_1917193.ttl
skipping 62900b6400000000.ttl
java.lang.ClassCastException
    at java.base/jdk.internal.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
[...]
    at java.base/java.util.concurrent.ForkJoinPool.scan(ForkJoinPool.java:1656)
    at java.base/java.util.concurrent.ForkJoinPool.runWorker(ForkJoinPool.java:1594)
    at java.base/java.util.concurrent.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:183)

The ordering of the files processed does seem to be threaded and non-deterministic. Seems pretty intermittent given that this was last poked a year and a half ago.