Open jxtps opened 2 years ago
There might be a deadlock happening somewhere. Is it trying to load JavaCPP from multiple threads at the same time?
You may also want to try with "org.bytedeco.javacpp.cachelibraries" and "org.bytedeco.javacpp.findlibraries" set to "false" with the latest snapshots: http://bytedeco.org/builds/
The snapshot with those additional settings gets stuck on:
Debug: Loading library libopenblas
Running MyTest.main()
standalone does not get stuck (but errors our since I don't have the correct 1.12 libs in my path, but that's arguably separate).
Deadlock: There's quite a few threads running, but from what I can tell only a single one is accessing the library at the time of the freeze.
It would help to see the stack trace of that thread, to see on which line it gets stuck.
Ok, this is interesting, while digging up the stack traces it all of a sudden started working! I had to revert to sbt 1.3.13 due to issues with incremental compilation loops in 1.7.1 on windows, and that fixed it.
When using sbt 1.7.1 it hangs here:
I figured the line in bytedeco would be the most relevant.
The actual freeze happens in native void load(String name, boolean isBuiltin);
which is on line 1719 in java.lang.ClassLoader
in my version of java (1.8.0_241).
That means something is interfering with OpenBLAS itself, that's not related to JavaCPP per se. Since MKL is usually faster than OpenBLAS anyway, try to load that instead by setting the "org.bytedeco.openblas.load" system property to "mkl_rt": https://github.com/bytedeco/javacpp-presets/tree/master/openblas#documentation
Well, when I do that it does try mkl_rt first, but since I don't actually have those libs it falls back on openblas, and then freezes...
What I find really confusing is how the sbt version could have an impact here at all. I guess one of its zillion dependencies does something funky.
I've upgraded the presets for OpenBLAS 0.3.21 (and PyTorch 1.12.1 for that matter), which might contain some fixes in there. Please give it a try with the snapshots: http://bytedeco.org/builds/
It would help to see the stack trace of that thread to see on which line it gets stuck.
Ok, so I'm back to upgrading our stack, and (most unfortunately) back on this issue. I have created yet another minimally reproducing project, but things have changed:
Java: Amazon.com Inc. Java 17.0.9 (Corretto)
Scala: 2.13.12
Sbt: 1.9.8
Play: 2.9.1
"org.bytedeco" % "pytorch-platform" % "2.0.1-1.5.9"
It still works in Java, but now it works under SBT when org.bytedeco.javacpp.logger.debug=true
, but fails when org.bytedeco.javacpp.logger.debug=false
.
C:\Users\admin\.javacpp\cache\openblas-0.3.23-1.5.9-windows-x86_64.jar\org\bytedeco\openblas\windows-x86_64\libopenblas_nolapack.dll
still appears to be the culprit
The line numbers have changed, but the stack trace is very similar to before:
Hmm... debug=true
doesn't reliably allow it to run, but it sometimes works. I haven't pinned down when/why.
Well, like I said before, if it works fine with MKL, just do that
Ok, so this appears to be an issue where the initialization code isn't thread safe. I'm not sure why it's getting hit from multiple fronts, but it seems to be. When I add synchronized
to a key piece of my external initialization code (MyTest.main
in the sample project) it works regardless of if debug is true or false.
(technically I haven't tested the minimally reproducing example, this is in my actual application)
Maybe we should add synchronized
to some key function in the load chain hierarchy?
You can try, but I don't think it's going to help because that doesn't synchronize between multiple instances of the same class, so we need to use file synchronization instead and I've heard issues with that not working well on Windows. Please do feel free to debug that though
See issue https://github.com/bytedeco/javacpp/issues/197, for example
This is a weird one. Last night in my dev branch it was seemingly working fine. Then I backported some of the changes to be able to do a hotfix release, and now it's not working anymore. Very confusing.
Switching to using MKL appears to have fixed it.
If I create a small sample project with the following class:
Then I can run that
main
function "standalone" in IntelliJ without problems, and the loading of a large number of libraries flashes by quickly.However, if I instead run the play project (by creating a Play 2 run profile in IntelliJ), then when the controller calls
MyTest.main()
then some of the libraries flash by, but then the library loading hangs on:If I then stop the server, the loading suddenly continues with the same javacpp debug printouts as when I ran
MyTest.main()
separately, but shortly thereafter the whole process exits.This is really strange and I have no idea why it's happening. Right now I'm basically stuck with JavaCPP working great in isolation, but as soon as I try to use it within my sbt/play project it just freezes everything.
This is using Oracle Corporation Java 1.8.0_241 on windows, sbt 1.7.1, play 2.8.16, scala 2.13.8 and
"org.bytedeco" % "pytorch-platform" % s"1.10.2-1.5.7"
. I have the relevant libtorch dlls all on the path (hence theorg.bytedeco.javacpp.pathsFirst=true
).???