Open wzh241215 opened 1 year ago
cc @westonpace maybe?
HDFS File uses libhdfs, which wraps the JNI. It might lock when user issue a read, is this related?
HDFS File uses libhdfs, which wraps the JNI. It might lock when user issue a read, is this related?
it seems right, we catch the stack when deadlock. When we read the hdfs files, the wraps of the JNI may be inefficiently to read hdfs files, and it call the deadlock in multi threades. In the ORC, it use the libhdfspp all files in ORC project.Is there any way in ARROW, we could not use the wraps of the JNI, just like the compile options or others?
What is hdfsThreadDestructor
?
In the ORC, it use the libhdfspp all files in ORC project.Is there any way in ARROW, we could not use the wraps of the JNI, just like the compile options or others?
Are you asking if we can use https://github.com/haohui/libhdfspp instead of our current hdfs implementation?
What is
hdfsThreadDestructor
?In the ORC, it use the libhdfspp all files in ORC project.Is there any way in ARROW, we could not use the wraps of the JNI, just like the compile options or others?
Are you asking if we can use https://github.com/haohui/libhdfspp instead of our current hdfs implementation?
hdfsThreadDestructor is when the hdfs thread destroy , it will release some jvm source, the code like that yes,we think the hdfs implementation in java is inefficiently, so we want to use the libhdfspp
libhdfspp
doesn't use lock, however, it doesn't have a pread
, so maybe that would be problem when you pread
.
yes,we think the hdfs implementation in java is inefficiently, so we want to use the libhdfspp
It should be possible to create a new filesystem. Instead of changing HdfsFileSystem you can create HdfsppFileSystem.
libhdfspp doesn't use lock, however, it doesn't have a pread, so maybe that would be problem when you pread.
Yes, if the library does not have pread then we have to do a seek followed by a read for ReadAt
.
Describe the bug, including details regarding any error messages, version, and platform.
we use ThreadPool to read hdfsfile, when the std::thread finishes task,it is destroyed and the hdfsThreadDestructor could be called. In the hdfsThreadDestructor , it will call (*env)->GetJavaVM(env, &vm) and it couldn't get the vm which is condition wait, so the program is not finished.
arrow version: apache-arrow-11.0.0 hadoop version: branch-3.2.0
Component(s)
C++