apache / arrow

Apache Arrow is a multi-language toolbox for accelerated data interchange and in-memory processing
https://arrow.apache.org/
Apache License 2.0
14.36k stars 3.49k forks source link

[C++] We use ThreadPool to read hdfsfile, when the std::thread finished, it call the deadlock! #36432

Open wzh241215 opened 1 year ago

wzh241215 commented 1 year ago

Describe the bug, including details regarding any error messages, version, and platform.

we use ThreadPool to read hdfsfile, when the std::thread finishes task,it is destroyed and the hdfsThreadDestructor could be called. image In the hdfsThreadDestructor , it will call (*env)->GetJavaVM(env, &vm) and it couldn't get the vm which is condition wait, so the program is not finished. image

arrow version: apache-arrow-11.0.0 hadoop version: branch-3.2.0

Component(s)

C++

raulcd commented 1 year ago

cc @westonpace maybe?

mapleFU commented 1 year ago

HDFS File uses libhdfs, which wraps the JNI. It might lock when user issue a read, is this related?

wzh241215 commented 1 year ago

HDFS File uses libhdfs, which wraps the JNI. It might lock when user issue a read, is this related?

it seems right, we catch the stack when deadlock. When we read the hdfs files, the wraps of the JNI may be inefficiently to read hdfs files, and it call the deadlock in multi threades. In the ORC, it use the libhdfspp all files in ORC project.Is there any way in ARROW, we could not use the wraps of the JNI, just like the compile options or others?

westonpace commented 1 year ago

What is hdfsThreadDestructor?

In the ORC, it use the libhdfspp all files in ORC project.Is there any way in ARROW, we could not use the wraps of the JNI, just like the compile options or others?

Are you asking if we can use https://github.com/haohui/libhdfspp instead of our current hdfs implementation?

wzh241215 commented 1 year ago

What is hdfsThreadDestructor?

In the ORC, it use the libhdfspp all files in ORC project.Is there any way in ARROW, we could not use the wraps of the JNI, just like the compile options or others?

Are you asking if we can use https://github.com/haohui/libhdfspp instead of our current hdfs implementation?

hdfsThreadDestructor is when the hdfs thread destroy , it will release some jvm source, the code like that image yes,we think the hdfs implementation in java is inefficiently, so we want to use the libhdfspp

mapleFU commented 1 year ago

libhdfspp doesn't use lock, however, it doesn't have a pread, so maybe that would be problem when you pread.

westonpace commented 1 year ago

yes,we think the hdfs implementation in java is inefficiently, so we want to use the libhdfspp

It should be possible to create a new filesystem. Instead of changing HdfsFileSystem you can create HdfsppFileSystem.

libhdfspp doesn't use lock, however, it doesn't have a pread, so maybe that would be problem when you pread.

Yes, if the library does not have pread then we have to do a seek followed by a read for ReadAt.