JuliaInterop / JavaCall.jl

Call Java from Julia
http://juliainterop.github.io/JavaCall.jl
Other
118 stars 53 forks source link

Ways to mitigate overall jankyness (intermittent NoClassDefFoundError and IllegalMonitorStateException) #132

Open DrChainsaw opened 3 years ago

DrChainsaw commented 3 years ago

Sorry for vague issue, I'm just hoping there is some best practice or common pitfall to avoid this.

Symptoms are basically that attempts to interop randomly just fails, from what I can tell with either of the two exceptions in the subject line with no discernable pattern as to which operation is causing it (e.g the NoClass exception is for different classes each time).

Sometimes just re-running the command works (even for NoClass) and sometimes a Repl restart is required. In maybe 80% of the cases the operation just works as expected, but sometimes fails if called again with different input (e.g try the same operation on a different file). Searching for "julia javacall IllegalMonitorStateException" lead me to this issue which has quite similar symptoms (although it is not fully clear if they are intermittent)

julia> versioninfo()
Julia Version 1.5.1
Commit 697e782ab8 (2020-08-25 20:08 UTC)
Platform Info:
  OS: Windows (x86_64-w64-mingw32)
  CPU: Intel(R) Core(TM) i7-8650U CPU @ 1.90GHz
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-9.0.1 (ORCJIT, skylake)

For obvious reasons its hard to produce an MWE but the basic structure of the program is something like this (with maybe 10-20 classes involved):

module MyModule
using JavaCall
const JSomeClass = @jimport some.pkg.SomeClass
const JSomeOtherCalss = @jimport another.pkg.SomeOtherClass

function init()
   cp = "all;the;libs.jar"
   JavaCall.init("-Djava.class.path=$cp")
end

function doThis(args...)
     obj = JSomeClass()
     return jcall(obj, "someMethod", ....)
end

function doThat(args...)
     obj = JSomeOtherClass()
     return jcall(obj, "someOtherMethod", ....)
end

function doMore(o1, o2, ...)
     x = jcall(o1, "...", ..., o2)
     #more jcalls with o1, o2, x ect
end

function doMost(args...)
    o1 = doThis(a...)
    o2 = doThat(b...)
   doMore(o1, o2, c...)
end

end

It seems like errors are less common if functions are called directly in repl compared to if they are called through other functions. For instance, calling doThis and doThat in repl and then feed the outputs to doMore seems more likely to work than calling doMost.

Are there any "canonical packages" which use JavaCall I can look at and try to copy patterns from? I have started to look at Spark.jl, but maybe there are simpler examples.

I think I have ensured that everything runs from the main task/thread but I'm not sure if there is anything more to it than just checking that Threads.threadid() == 1.

DrChainsaw commented 3 years ago

Managed to get a longer stacktrace from an IllegalMonitorStateException (some parts edited manually to fit to example structure above):

julia> doMost(...)
Exception in thread "main" java.lang.IllegalMonitorStateException
        at java.net.URLClassLoader.findClass(Unknown Source)
        at java.lang.ClassLoader.loadClass(Unknown Source)
        at java.lang.ClassLoader.loadClass(Unknown Source)
        at sun.misc.Launcher$AppClassLoader.loadClass(Unknown Source)
        at java.lang.ClassLoader.loadClass(Unknown Source)
ERROR: JavaCall.JavaCallError("Error calling Java: java.lang.IllegalMonitorStateException")
Stacktrace:
 [1] geterror(::Bool) at …\.julia\packages\JavaCall\aVXyt\src\core.jl:371
 [2] geterror at …\.julia\packages\JavaCall\aVXyt\src\core.jl:356 [inlined]
 [3] _jcall(::JavaCall.JavaObject{Symbol("some.pkg.SomeClass")}, ::Ptr{Nothing}, ::Ptr{Nothing}, ::Type{T} where T, ::Tuple{DataType}, ::String) at ...\.julia\packages\JavaCall\aVXyt\src\core.jl:328
 [4] jcall(::JavaCall.JavaObject{Symbol("some.pkg.SomeClass")}, ::String, ::Type{T} where T, ::Tuple{DataType}, ::String) at ...\.julia\packages\JavaCall\aVXyt\src\core.jl:232
 [5] doThis(::JavaCall.JavaObject{Symbol("some.pkg.SomeClass")})
....
mkitti commented 3 years ago

You might to check that you are not only running on the main thread, but also the main Task.

julia> Base.current_task() === Base.roottask
true
mkitti commented 3 years ago

The other canonical package besides Spark.jl is https://github.com/aviks/Taro.jl

I usually test against both Taro.jl and Spark.jl when I make changes to JavaCall

mkitti commented 3 years ago

A MWE is really need unfortunately. It's basically impossible for me to look into this without some code that I can execute.

I have gleaned that you are doing something with multithreading, possibly on both the Julia and Java sides of the interop.

My intuition here is that the JavaCall._jmc_cache may be invalid somehow due to being populated from multiple threads. The idea is that JavaCall doesn't have to look up a class once you have used it in order to save overhead. The problem is that the Java references actually need to be different depending on which Thread and which stack you are using, so perhaps what was cached doesn't actually make sense.

Your approach is reasonable. Use Channels and RemoteChannels to call into a worker running on the main Task and Thread which will then call into the Java Native Interface.

Is Java calling back into Julia at some point?

DrChainsaw commented 3 years ago

Thanks for looking into this and I fully understand it is impossible to do more on your side. I guess I was hoping that someone had seen this before. Thanks for the tip about the roottask. I'll use just to make sure.

Multi-threading from my other issue was an experiment but not something I need or which the julia package I'm using this for does. The java code might do it but I think I have managed to short circuit those parts (which is part of what I'm trying to achieve here).

There is no calling back into julia from java.

Anyways, I might have discovered some rituals which mitigates the issue to a great extent and hopefully this can be used to create an MWE. If it solves my problems but I can't make an MWE out of it I'll post here what made the difference.

mkitti commented 3 years ago

This is basically a more general JNI issue. If you search JNI IllegalMonitorStateException you will find a lot more.

In the other issue #131 , I outlined how to tap the JNI threading interface. It's going to take more work to get it to work properly though.

DrChainsaw commented 3 years ago

Fwiw it seems like Base.current_task() !== Base.roottask is happening and issues seem to be more likely then. I'm using FileTrees which in turn uses Dagger. For now I'm not using any workers except the main process but it seems like there is a sneaky asyncmap in Dagger which happens regardless.

I will see if I can use this to cook up an MWE or if it is a red herring.

DrChainsaw commented 3 years ago

Ok, here is one MWE. Not certain this is the (only) issue I'm having though.

It should work with @async in windows without further action, right? You mentioned something about Channels above. Would that be a possible workaround?

julia> versioninfo()
Julia Version 1.5.1
Commit 697e782ab8 (2020-08-25 20:08 UTC)
Platform Info:
  OS: Windows (x86_64-w64-mingw32)
  CPU: Intel(R) Core(TM) i7-8650U CPU @ 1.90GHz
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-9.0.1 (ORCJIT, skylake)
Environment:
  JULIA_EDITOR = "...\AppData\Local\Programs\Microsoft VS Code\Code.exe"
  JULIA_NUM_THREADS =

julia> download("https://archive.apache.org/dist/tika/tika-app-1.23.jar",  "tika-app-1.23.jar")
"tika-app-1.23.jar"

julia> using JavaCall

julia> JavaCall.init("-Djava.class.path=tika-app-1.23.jar")

julia> const WorkbookFactory = @jimport org.apache.poi.ss.usermodel.WorkbookFactory
JavaObject{Symbol("org.apache.poi.ss.usermodel.WorkbookFactory")}

julia> fetch(@async WorkbookFactory())
ERROR: TaskFailedException:
JavaCall.JavaCallError("Class Not Found org/apache/poi/ss/usermodel/WorkbookFactory")
Stacktrace:
 [1] _metaclass(::Symbol) at ...\.julia\packages\JavaCall\aVXyt\src\core.jl:338
 [2] metaclass(::Symbol) at ...\.julia\packages\JavaCall\aVXyt\src\core.jl:344
 [3] jnew(::Symbol, ::Tuple{}) at ...\.julia\packages\JavaCall\aVXyt\src\core.jl:209
 [4] JavaObject at ...\.julia\packages\JavaCall\aVXyt\src\core.jl:103 [inlined]
 [5] JavaObject at ...\.julia\packages\JavaCall\aVXyt\src\core.jl:108 [inlined]
 [6] (::var"#11#12")() at .\task.jl:356
Stacktrace:
 [1] wait at .\task.jl:267 [inlined]
 [2] fetch(::Task) at .\task.jl:282
 [3] top-level scope at task.jl:365

julia> WorkbookFactory()
JavaObject{Symbol("org.apache.poi.ss.usermodel.WorkbookFactory")}(JavaCall.JavaLocalRef(Ptr{Nothing} @0x000000002ef72548))

julia> fetch(@async WorkbookFactory()) # After creating it once in rootthread it works also in forks
JavaObject{Symbol("org.apache.poi.ss.usermodel.WorkbookFactory")}(JavaCall.JavaLocalRef(Ptr{Nothing} @0x000000002ef72550))

I could not manage to create the same issue with classes which are part of the stdlib.

mkitti commented 3 years ago

Here's the best workaround that I have at the moment. It at least means you don't have to create an Object that you don't need.

using JavaCall
JavaCall.init("-Djava.class.path=tika-app-1.23.jar")
const WorkbookFactory = @jimport org.apache.poi.ss.usermodel.WorkbookFactory
JavaCall.JNI.FindClass(  JavaCall.javaclassname("org.apache.poi.ss.usermodel.WorkbookFactory")  )
@async WorkbookFactory()

I should note that this is a Windows specific issue. On Linux this works just fine with JULIA_COPY_STACKS=1:

$ JULIA_COPY_STACKS=1 julia --banner=no
julia> using JavaCall

julia> JavaCall.init("-Djava.class.path=tika-app-1.23.jar")

julia> const WorkbookFactory = @jimport org.apache.poi.ss.usermodel.WorkbookFactory
JavaObject{Symbol("org.apache.poi.ss.usermodel.WorkbookFactory")}

julia> @async WorkbookFactory()
Task (done) @0x00007f80b53ff340

julia> versioninfo()
Julia Version 1.5.1
Commit 697e782ab8 (2020-08-25 20:08 UTC)
Platform Info:
  OS: Linux (x86_64-pc-linux-gnu)
  CPU: AMD FX(tm)-8350 Eight-Core Processor
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-9.0.1 (ORCJIT, bdver1)
Environment:
  JULIA_COPY_STACKS = 1

On Windows, the issue is that the JNI call to FindClass returns NULL under @async initially.

julia> using JavaCall

julia> JavaCall.init("-Djava.class.path=tika-app-1.23.jar")

julia> const WorkbookFactory = @jimport org.apache.poi.ss.usermodel.WorkbookFactory
JavaObject{Symbol("org.apache.poi.ss.usermodel.WorkbookFactory")}

julia> task = @async JavaCall.JNI.FindClass(  JavaCall.javaclassname("org.apache.poi.ss.usermodel.WorkbookFactory")  )
Task (done) @0x00000000338dcb90

julia> fetch(task)
Ptr{Nothing} @0x0000000000000000

julia> JavaCall.JNI.FindClass(  JavaCall.javaclassname("org.apache.poi.ss.usermodel.WorkbookFactory")  )
Ptr{Nothing} @0x000000002f15de80

julia> task = @async JavaCall.JNI.FindClass(  JavaCall.javaclassname("org.apache.poi.ss.usermodel.WorkbookFactory")  )
Task (done) @0x0000000016373990

julia> fetch(task)
Ptr{Nothing} @0x000000002f15de88

My suspicion is that on Windows we really do need JULIA_COPY_STACKS=1 to work but it causes julia to crash.

DrChainsaw commented 3 years ago

Thanks a lot! The workaround worked for some cases but not for all:

julia> using JavaCall

julia> JavaCall.init("-Djava.class.path=tika-app-1.23.jar")

julia> const Tika = @jimport org.apache.tika.Tika
JavaObject{Symbol("org.apache.tika.Tika")}

julia> JavaCall.JNI.FindClass(JavaCall.javaclassname("org.apache.tika.Tika"))
Ptr{Nothing} @0x0000000000f32540

julia> @async Tika()
Exception in thread "main" Task (failed) @0x00000000143b5fb0
JavaCall.JavaCallError("Error calling Java: java.lang.IllegalMonitorStateException")
geterror(::Bool) at …\.julia\packages\JavaCall\aVXyt\src\core.jl:371
geterror at …\.julia\packages\JavaCall\aVXyt\src\core.jl:356 [inlined]
_jcall(::JavaMetaClass{Symbol("org.apache.tika.Tika")}, ::Ptr{Nothing}, ::typeof(JavaCall.JNI.NewObjectA), ::Type{T} where T, ::Tuple{}) at …\.julia\packages\JavaCall\aVXyt\src\core.jl:328
jnew(::Symbol, ::Tuple{}) at …\.julia\packages\JavaCall\aVXyt\src\core.jl:213
JavaObject at …\.julia\packages\JavaCall\aVXyt\src\core.jl:103 [inlined]
JavaObject at …\.julia\packages\JavaCall\aVXyt\src\core.jl:108 [inlined]
(::var"#11#12")() at .\task.jl:356

julia> Tika()
okt 08, 2020 3:45:15 EM org.apache.tika.config.InitializableProblemHandler$3 handleInitializableProblem
WARNING: J2KImageReader not loaded. JPEG2000 files will not be processed.
See https://pdfbox.apache.org/2.0/dependencies.html#jai-image-io
for optional dependencies.

okt 08, 2020 3:45:15 EM org.apache.tika.config.InitializableProblemHandler$3 handleInitializableProblem
WARNING: org.xerial's sqlite-jdbc is not loaded.
Please provide the jar on your classpath to parse sqlite files.
See tika-parsers/pom.xml for the correct version.
JavaObject{Symbol("org.apache.tika.Tika")}(JavaCall.JavaLocalRef(Ptr{Nothing} @0x0000000000f32570))

julia> @async Tika()
Task (done) @0x00000000143eb3d0

It does not show in this MWE, but in other cases I get errors of imported classes not being imported and importing them in the same manner seems to resolve the issue. Is there a way to list the imports for a class in JavaCall so the workaround can be applied recursively?

mkitti commented 3 years ago

The IllegalMonitorStateException is a distinct error, but I'll see if I can work with this new MWE to see what is happening there.

For recursively checking, I would use the Java reflection API: https://docs.oracle.com/javase/8/docs/api/java/lang/Class.html

DrChainsaw commented 3 years ago

Thousand thanks again for taking the time!

I'll try to work it out throught the java reflection API and see if there is any improvements on my side.

mkitti commented 3 years ago

It might go something like this:

julia> superclass = jcall( classforname("org.apache.tika.Tika"), "getSuperclass", JClass, ())
JavaObject{Symbol("java.lang.Class")}(JavaCall.JavaLocalRef(Ptr{Nothing} @0x0000000030b23978))

julia> jcall(superclass, "getTypeName", JString, ())
"java.lang.Object"

See also https://github.com/JuliaInterop/JavaCall.jl/blob/master/src/reflect.jl (master branch has more code than v0.7.6)

mkitti commented 3 years ago

I should have mentioned this earlier. I created a package that helps you run code on a specific Task: https://github.com/mkitti/TaskWorkers.jl


(@v1.5) pkg> add https://github.com/mkitti/TaskWorkers.jl
[...]

julia> using JavaCall, TaskWorkers

julia> JavaCall.init("-Djava.class.path=tika-app-1.23.jar")

julia> TaskWorkers.startworker_and_repl()
               _
   _       _ _(_)_     |  Documentation: https://docs.julialang.org
  (_)     | (_) (_)    |
   _ _   _| |_  __ _   |  Type "?" for help, "]?" for Pkg help.
  | | | | | | |/ _` |  |
  | | |_| | | | (_| |  |  Version 1.5.0 (2020-08-01)
 _/ |\__'_|_|_|\__'_|  |  Official https://julialang.org/ release
|__/                   |

julia> Base.current_task() == Base.roottask
false

julia> taskrun() do
           Base.current_task() == Base.roottask
       end
true

julia> const Tika = @jimport org.apache.tika.Tika
JavaObject{Symbol("org.apache.tika.Tika")}

julia> @async taskrun() do
           Tika()
       end
Oct 09, 2020 2:38:29 AM org.apache.tika.config.InitializableProblemHandler$3 handleInitializableProblem
WARNING: J2KImageReader not loaded. JPEG2000 files will not be processed.
See https://pdfbox.apache.org/2.0/dependencies.html#jai-image-io
for optional dependencies.

Oct 09, 2020 2:38:29 AM org.apache.tika.config.InitializableProblemHandler$3 handleInitializableProblem
WARNING: org.xerial's sqlite-jdbc is not loaded.
Please provide the jar on your classpath to parse sqlite files.
See tika-parsers/pom.xml for the correct version.
Task (runnable) @0x0000000015fdb0f0

julia> taskrun() do
           Tika()
       end
JavaObject{Symbol("org.apache.tika.Tika")}(JavaCall.JavaLocalRef(Ptr{Nothing} @0x00000000075ab310))

julia> versioninfo()
Julia Version 1.5.0
Commit 96786e22cc (2020-08-01 23:44 UTC)
Platform Info:
  OS: Windows (x86_64-w64-mingw32)
  CPU: Intel(R) Core(TM) i9-9880H CPU @ 2.30GHz
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-9.0.1 (ORCJIT, skylake)
mkitti commented 3 years ago

Julia/JavaCall works pretty well on WSL2. Just remember to set the environmental variable JULIA_COPY_STACKS=1

DrChainsaw commented 3 years ago

Thanks you so much again for all the help.

Your suggested workarounds with TaskWorkers and WSL2 are interesting and I will try to find some time to explore them. Since this little project I'm doing is something I hope I can use to market Julia a my workplace I'm a bit hesistant to add things which require extra setup.

Running on Linux with JULIA_COPY_STACKS=1 works and this is good enough for me right now. I could devise a workaround by using a CLI wrapper for the java code as I don't need alot of back-and-forth between java and julia. I have for now left the tight interop as a kind of advanced turbo-button.