Closed behrica closed 1 year ago
Just to not forget it.
I am bitten hardly by this now, as this type of error make the code hang, and no exception is printed. It can be easily reproduced y running in embed a script which tried to load a "non existing python module".
If I run it in "embedded mode", the below exception is not visible, and the python+JVM hangs for ever. I do see it, when running the same code in "non-embedded".
{:clojure.main/message
"Syntax error compiling at (src/crash.clj:12:1).\nModuleNotFoundError: No module named 'not existing'\n\n",
:clojure.main/triage
{:clojure.error/phase :compile-syntax-check,
:clojure.error/line 12,
:clojure.error/column 1,
:clojure.error/source "crash.clj",
:clojure.error/path "src/crash.clj",
:clojure.error/class java.lang.Exception,
:clojure.error/cause
"ModuleNotFoundError: No module named 'not existing'\n"},
:clojure.main/trace
{:via
[{:type clojure.lang.Compiler$CompilerException,
:message
"Syntax error compiling at (/home/carsten/Dropbox/sources/predict_disaster_tweet/src/crash.clj:12:1).",
:data
{:clojure.error/phase :compile-syntax-check,
:clojure.error/line 12,
:clojure.error/column 1,
:clojure.error/source
"/home/carsten/Dropbox/sources/predict_disaster_tweet/src/crash.clj"},
:at [clojure.lang.Compiler load "Compiler.java" 7652]}
{:type java.lang.Exception,
:message "ModuleNotFoundError: No module named 'not existing'\n",
:at
[libpython_clj2.python.ffi$check_error_throw
invokeStatic
"ffi.clj"
708]}],
:trace
[[libpython_clj2.python.ffi$check_error_throw
invokeStatic
"ffi.clj"
708]
[libpython_clj2.python.ffi$check_error_throw invoke "ffi.clj" 706]
[libpython_clj2.python$import_module invokeStatic "python.clj" 165]
[libpython_clj2.python$import_module invoke "python.clj" 158]
[train$eval14176 invokeStatic "crash.clj" 12]
[train$eval14176 invoke "crash.clj" 12]
[clojure.lang.Compiler eval "Compiler.java" 7181]
[clojure.lang.Compiler load "Compiler.java" 7640]
[clojure.lang.Compiler loadFile "Compiler.java" 7578]
[clojure.main$load_script invokeStatic "main.clj" 475]
[clojure.main$script_opt invokeStatic "main.clj" 535]
[clojure.main$script_opt invoke "main.clj" 530]
[clojure.main$main invokeStatic "main.clj" 664]
[clojure.main$main doInvoke "main.clj" 616]
[clojure.lang.RestFn applyTo "RestFn.java" 137]
[clojure.lang.Var applyTo "Var.java" 705]
[clojure.main main "main.java" 40]],
:cause "ModuleNotFoundError: No module named 'not existing'\n",
:phase :compile-syntax-check}}
I was not sure, if the fix would be here or in cljbridge, so I opened this before:
https://github.com/behrica/clojurebridge/issues/2 has some code snippets to reproduce it
@cnuernber Could you have a look at this ? I am not able to dig myself, too much native and macro magic for me.
Each time my code hangs, I am always wondering if there was a python exception, which I just don't see.
@behrica can you provide a little more context? "ModuleNotFoundError: No module named 'not existing'
is a Python error, but 'not existing'
is a very unusual module name. I feel like I would need more information to pinpoint why the error is not printing to the console, because I don't think I'd be able to reproduce this.
It happnes with "any exception" on python side, if using embedded mode and loading a script:
crash.clj;
(ns train
(:require [libpython-clj2.python :refer [py.- py.] :as py]
[libpython-clj2.python.ffi :as ffi]))
;; [libpython-clj2.require :refer [require-python]]
;; [libpython-clj2.python.gc]
(println "-------- manual-gil: " ffi/manual-gil)
(def locked (ffi/lock-gil))
(println :gil-locked)
(println :before-crash)
(py/import-module "not existing") ;; whatever error on python side, like loading an non existing library
(println "crash fininished")
Load the file in embedded:
python -c 'from clojurebridge import cljbridge;cljbridge.load_clojure_file(clj_file="crash.clj",mvn_local_repo="/home/carsten/.m2/repository")'
It "hangs" forever without printing any error
This appears to me to be an issue with clojurebridge, not libpython-clj. Specifically javabridge's static call pathway must swallow exceptions.
Or I guess their make_method pathway.
It does not happen with normal Clojure / JVM exceptions, that why I am wondering.
Hmm. In that case something needs to check the python exception state and rethrow.
Using py/with-gil
behaves "better" (and different depending on "manual gil or not")....
So this code:
(ns train
(:require [libpython-clj2.python :refer [py.- py.] :as py]))
(py/with-gil
(py/import-module "not existing"))
called like this:
export _JAVA_OPTIONS="-Dlibpython_clj.manual_gil=true";python -c 'from clojurebridge import cljbridge;cljbridge.load_clojure_file(clj_file="crash2.clj",mvn_local_repo="/home/carsten/.m2/repository")'
gives
ARNING: Please consider reporting this to the maintainers of tech.v3.datatype.UnsafeUtil
WARNING: Use --illegal-access=warn to enable warnings of further illegal reflective access operations
WARNING: All illegal access operations will be denied in a future release
#
# A fatal error has been detected by the Java Runtime Environment:
#
# SIGSEGV (0xb) at pc=0x00007f92b2d87d7a, pid=38488, tid=38488
#
# JRE version: OpenJDK Runtime Environment (11.0.16.1+1) (build 11.0.16.1+1)
# Java VM: OpenJDK 64-Bit Server VM (11.0.16.1+1, mixed mode, tiered, compressed oops, g1 gc, linux-amd64)
# Problematic frame:
# C [libpython3.10.so.1.0+0x187d7a] PyImport_Import+0x3a
#
# Core dump will be written. Default location: Core dumps may be processed with "/usr/lib/systemd/systemd-coredump %P %u %g %s %t %c %h" (or dumping to /home/carsten/Dropbox/sources/libpython-clj/core.38488)
#
# An error report file with more information is saved as:
# /home/carsten/Dropbox/sources/libpython-clj/hs_err_pid38488.log
and called like this
export _JAVA_OPTIONS="-Dlibpython_clj.manual_gil=false";python -c 'from clojurebridge import cljbridge;cljbridge.load_clojure_file(clj_file="crash2.clj",mvn_local_repo="/home/carsten/.m2/repository")'
gives an Java exception (which I could probably catch):
ARNING: All illegal access operations will be denied in a future release
Exception in thread "Thread-0" Syntax error macroexpanding at (/home/carsten/Dropbox/sources/libpython-clj/crash2.clj:5:1).
at clojure.lang.Compiler.load(Compiler.java:7665)
at clojure.lang.Compiler.loadFile(Compiler.java:7591)
at clojure.lang.RT$3.invoke(RT.java:327)
at clojure.lang.Var.invoke(Var.java:384)
Caused by: java.lang.Exception: ModuleNotFoundError: No module named 'not existing'
at libpython_clj2.python.ffi$check_error_throw.invokeStatic(ffi.clj:708)
at libpython_clj2.python.ffi$check_error_throw.invoke(ffi.clj:706)
at libpython_clj2.python$import_module.invokeStatic(python.clj:165)
at libpython_clj2.python$import_module.invoke(python.clj:158)
at train$eval15837.invokeStatic(crash2.clj:6)
at train$eval15837.invoke(crash2.clj:5)
at clojure.lang.Compiler.eval(Compiler.java:7194)
at clojure.lang.Compiler.load(Compiler.java:7653)
... 3 more
Sep 30, 2022 10:30:39 AM clojure.tools.logging$eval3239$fn__3242 invoke
INFO: Reference thread starting
Traceback (most recent call last):
File "<string>", line 1, in <module>
File "/home/carsten/.local/lib/python3.10/site-packages/clojurebridge/cljbridge.py", line 187, in load_clojure_file
resolve_call_fn("clojure.core/load-file",
File "/home/carsten/.local/lib/python3.10/site-packages/clojurebridge/cljbridge.py", line 73, in resolve_call_fn
return resolve_fn(namespaced_fn_name)(*args)
File "/home/carsten/.local/lib/python3.10/site-packages/clojurebridge/cljbridge.py", line 50, in __call__
return invoker(*args)
File "/home/carsten/.local/lib/python3.10/site-packages/javabridge/jutil.py", line 961, in method
result = call(self.o, name, sig, *args)
File "/home/carsten/.local/lib/python3.10/site-packages/javabridge/jutil.py", line 892, in call
result = fn(*nice_args)
File "/home/carsten/.local/lib/python3.10/site-packages/javabridge/jutil.py", line 859, in fn
raise JavaException(x)
javabridge.jutil.JavaException: Syntax error macroexpanding at (/home/carsten/Dropbox/sources/libpython-clj/crash2.clj:5:1).
We can probably agree, that this is the expected behavior.
A JVM exception and neither "jvm crash" nor "jvm hanging".
And we have the correct behaviour in the REPL, in "Non-embedded"
We have as well correct behaviour "in embedded mode from the REPL".
The "load_clojure_file" makes the issue:
python -c 'from clojurebridge import cljbridge;cljbridge.load_clojure_file(clj_file="crash2.clj",mvn_local_repo="/home/carsten/.m2/repositor
This load_clojure_file
pathway was added by me to clojurebridge, and it was therefore never tested by you, I suppose.
It is documented here: https://clj-python.github.io/libpython-clj/embedded.html
chapter: Loading and running a Clojure file in embedded mode
The new entry point is here: https://github.com/behrica/clojurebridge/blob/2c3c6e6c6d3d74f61aa2d68d7bf1c770d76b09cc/src/clojurebridge/cljbridge.py#L173
I think I do the "same" as the other entry point: https://github.com/behrica/clojurebridge/blob/2c3c6e6c6d3d74f61aa2d68d7bf1c770d76b09cc/src/clojurebridge/cljbridge.py#L142
To make it more confusing, there is an other pathway I made as well, which loads a fixed "user.clj" file.
This prints as well teh exception.....
python -c 'from clojurebridge import cljbridge;cljbridge.init_jvm(load_user_clj=True,mvn_local_repo="/home/carsten/.m2/repository")'
(I propose to later remove this load_user_clj
pathway, as it is superseded by the load_clojure_file
)
So lets ignore this for the moment,
I created: https://github.com/behrica/clojurebridge/issues/5
Let's concentrated on the "hanging" one, which is worst from a user experience point of view:
crash.clj
(ns train
(:require [libpython-clj2.python :refer [py.- py.] :as py]
[libpython-clj2.python.ffi :as ffi]))
;; [libpython-clj2.require :refer [require-python]]
;; [libpython-clj2.python.gc]
(println "-------- manual-gil: " ffi/manual-gil)
(def locked (ffi/lock-gil))
(println :gil-locked)
(println :before-crash)
(py/import-module "not existing") ;; whatever error on python side, like loading an non existing library
(println "crash finished")
run as
export _JAVA_OPTIONS="-Dlibpython_clj.manual_gil=false";python -c 'from clojurebridge import cljbridge;cljbridge.load_clojure_file(clj_file="crash.clj",mvn_local_repo="/home/carsten/.m2/repository")'
results in hanging for ever
expected behaviour A nice JVM exception:
Exception in thread "Thread-0" Syntax error macroexpanding at (/home/carsten/Dropbox/sources/libpython-clj/crash.clj:5:1).
at clojure.lang.Compiler.load(Compiler.java:7665)
at clojure.lang.Compiler.loadFile(Compiler.java:7591)
at clojure.lang.RT$3.invoke(RT.java:327)
at clojure.lang.Var.invoke(Var.java:384)
Caused by: java.lang.Exception: ModuleNotFoundError: No module named 'not existing'
(same hanning happens without calling "lock/gil")
and independent of the "manual_gil") setting, so we get an even easier hanging szenario:
You might ask, why do I play with "manul_gil" and GIL locking.
Because this makes python library simpletransformers
work more reliable (proven by repeated executions and counting training sucecsses)
But by doing so, I experience now the issue we discuss. If I get an python exception in my code with "manuel GIL handling", it hangs.
As a "workaround" I now run it without manual GIL (to see if working on little data) and then switch to "with manual GIL" for train on full data, which is annoying to do.
Especially because the instability of simpletransformers
might result in "hanging" as well (or JVM crash)
Even simpler code t reproduce:
Any exception thronw by pythonresult in "hanging"
(ns train
(:require [libpython-clj2.python :refer [py.- py.] :as py]
[libpython-clj2.python.ffi :as ffi]))
(def gil (ffi/lock-gil))
(py/run-simple-string "invalid syntax !!!")
(ffi/unlock-gil gil)
clled like this:
export _JAVA_OPTIONS="-Dlibpython_clj.manual_gil=true"
python -c 'from clojurebridge import cljbridge;cljbridge.load_clojure_file(clj_file="crash.clj",mvn_local_repo="/home/carsten/.m2/repository")'
Well, I stand corrected :-). That definitely appears to be a bug in libpython-clj.
Perhaps the unlock-gil code needs check-error-throw. Hmm - that needs to happen nearly after every python call - it was handled by with-gil and it should still be happening.
I found a way to at least print the error, but it still hang
diff --git a/src/libpython_clj2/python/ffi.clj b/src/libpython_clj2/python/ffi.clj
index 773d1f9..ec46e55 100644
--- a/src/libpython_clj2/python/ffi.clj
+++ b/src/libpython_clj2/python/ffi.clj
@@ -706,7 +706,8 @@ Each call must be matched with PyGILState_Release"}
(defn check-error-throw
[]
(when-let [error-str (check-error-str)]
- (throw (Exception. ^String error-str))))
+ (do (println :error error-str)
+ (throw (Exception. ^String error-str)))))
It gets as well solved by unlocking the gil in a finally
(try
(py/run-simple-string "1/0")
(finally
(ffi/unlock-gil gil)))
I noticed already before, that in embedded mode and loading a clj file,
the JVM does not terminate without the unlock
So the "hanging" we see is maybe the following:
The exception:
so maybe it is simpler to find out and fix that in the following the VM does not terminate:
(ns train
(:require [libpython-clj2.python :refer [py.- py.] :as py]
[libpython-clj2.python.ffi :as ffi]))
(def gil (ffi/lock-gil))
(py/run-simple-string "print('ok')")
I saw that in the hanging state some thread seems to wait forever:
Name: tech.resource.gc ref thread
State: TIMED_WAITING on java.lang.ref.ReferenceQueue$Lock@28f1bb74
Total blocked: 1 Total waited: 87,568
Stack trace:
java.base@11.0.16.1/java.lang.Object.wait(Native Method)
java.base@11.0.16.1/java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:155)
tech.v3.resource.gc$watch_reference_queue$fn__10795.invoke(gc.clj:27)
tech.v3.resource.gc$watch_reference_queue.invokeStatic(gc.clj:23)
tech.v3.resource.gc$watch_reference_queue.invoke(gc.clj:21)
tech.v3.resource.gc$start_reference_thread$fn__10807.invoke(gc.clj:47)
app//clojure.lang.AFn.run(AFn.java:22)
java.base@11.0.16.1/java.lang.Thread.run(Thread.java:829)
not sure, if relevant
The java api has a try-with-resources pattern -
(with-open [glock (java_api/GILLocker)]
...)
That will automatically do the try-finally. Anything auto-closeable will be closed in a with-open macro.
There is a new API in latest (2.020) - with-manual-gil that will automatically lock/unlock the gil.
user> ffi/manual-gil
true
user> (py/with-manual-gil
(py/run-simple-string "!! syntax error"))
Execution error at libpython-clj2.python.ffi/check-error-throw (ffi.clj:708).
File "<string>", line 1
!! syntax error
^
SyntaxError: invalid syntax
user>
Outputs:
Originally posted by @cnuernber in https://github.com/clj-python/libpython-clj/issues/219#issuecomment-1257295769