bytedeco / javacpp-presets

The missing Java distribution of native C++ libraries
Other
2.62k stars 730 forks source link

Running two numpy thread application in same one JVM,all will down!!! #817

Open penetest opened 4 years ago

penetest commented 4 years ago

Running two numpy thread application in same one JVM,all will down!!!

show some error:

#
# A fatal error has been detected by the Java Runtime Environment:
#
#  SIGSEGV (0xb) at pc=0x00007fe4d41b997f, pid=12746, tid=0x00007fe3a7a7d700
#
# JRE version: Java(TM) SE Runtime Environment (8.0_171-b11) (build 1.8.0_171-b11)
# Java VM: Java HotSpot(TM) 64-Bit Server VM (25.171-b11 mixed mode linux-amd64 compressed oops)
# Problematic frame:
# C  [libpython3.7m.so.1.0+0xe097f]
#
# Core dump written. Default location: /home/macal/deploy/udh-dataflow-1.0.5/streamsets-datacollector-3.8.1/core or core.12746
#
# An error report file with more information is saved as:
# /home/macal/deploy/udh-dataflow-1.0.5/streamsets-datacollector-3.8.1/hs_err_pid12746.log
#
# If you would like to submit a bug report, please visit:
#   http://bugreport.java.com/bugreport/crash.jsp
# The crash happened outside the Java Virtual Machine in native code.
# See problematic frame for where to report the bug.
saudet commented 4 years ago

CPython isn't thread-safe by default. There's information about this here in their documentation: https://docs.python.org/3/c-api/init.html It's really complicated, that's for sure, so having a higher-level API on top of that would be nice, yes.

CristianPi commented 1 year ago

I have the same problem, and also absolutely no idea on how CPython works, but, can't we spawn more thread, one with each its own python interpreter? SubInterpreter does not works with numpy... and multiple request to my spring boot api crash everything :-)

@penetest did you find a solution to this problem?

agibsonccc commented 1 year ago

@CristianPi we built some wrappers on top of this that might be useful for you to give a shot: https://deeplearning4j.konduit.ai/python4j/tutorials/quickstart

Please do let me know if you find it useful.

CristianPi commented 1 year ago

@agibsonccc thanks! for sure it's nicer, but still does not solve the problem, or im i wrong? Performance wise isn´t that like a synchronized method? I mean, if i execute two piece of code concurrently, they will be executed synchronously 1 then 2 right?

On the api side of spring i just put synchronized for now and it works, i can spawn more small (256 MB ram)containers if i want parallelism.

agibsonccc commented 1 year ago

@CristianPi no we actually do the context management Sam mentioned above. You can spawn multiple interpreters if you want. This was built with that use case in mind. That's just the hello world quickest way to run python code. We introduce the GIL abstraction for handling when code is running python code. No matter what you'd need to do that. Feel free to ask more questions on the forums if you want to pursue this. Underneath it's still the same bindings anyways :)

saudet commented 1 year ago

@CristianPi For that purpose you'll need to look into doing "multiprocessing": https://docs.python.org/3/library/multiprocessing.html

agibsonccc commented 1 year ago

What @saudet mentioned is definitely the safe way to do it. We've had success with different interpreters in different threads. Usually we just manage the variables in java and translate them as needed (this is usually a zero copy process) with python being mainly a compute example.

If you're trying to do numpy arrays without an additional layer it's very doable but requires some setup.