apache / arrow

Apache Arrow is the universal columnar format and multi-language toolbox for fast data interchange and in-memory analytics
https://arrow.apache.org/
Apache License 2.0
14.49k stars 3.53k forks source link

[Python]OSError: Unable to load libjvm. when used in windows #12743

Open pchaoda opened 2 years ago

pchaoda commented 2 years ago

Hi all, I am using pyarrow==7.0.0 to connect hdfs. It run well with linux,but unfortunately get error in windows. I have set JAVA_HOME, HADOOPHOME,ARROW_LIBHDFS_DIR JAVA_HOME=C:\Users\think\Desktop\python-SDK-green\jdk18 HADOOP_HOME=C:\Users\think\Downloads\hadoop-2.10.1.tar\hadoop-2.10.1\hadoop-2.10.1 ARROW_LIBHDFS_DIR=C:\Users\think\Desktop\python-SDK-green\hadoop_client\lib\native;C:\Users\think\Desktop\python-SDK-green\jdk18\jre\bin\server; when I am using pyarrow.hdfs.connect(), I am getting the error:

Traceback (most recent call last):
  File "C:\Users\think\Desktop\python-SDK-green\python\test.py", line 7, in <module>
    data_provider = DataProvider()
  File "C:\Users\think\Desktop\python-SDK-green\python\lib\site-packages\nescqdata\MarketData\dataProvider.py", line 15, in __init__
    super(DataProvider, self).__init__(dfs)
  File "C:\Users\think\Desktop\python-SDK-green\python\lib\site-packages\nescqdata\baseDataProvider.py", line 53, in __init__
    self.dfs = pa.hdfs.connect() if dfs is None else dfs
  File "C:\Users\think\Desktop\python-SDK-green\python\lib\site-packages\pyarrow\hdfs.py", line 227, in connect
    return _connect(
  File "C:\Users\think\Desktop\python-SDK-green\python\lib\site-packages\pyarrow\hdfs.py", line 237, in _connect
    fs = HadoopFileSystem(host=host, port=port, user=user,
  File "C:\Users\think\Desktop\python-SDK-green\python\lib\site-packages\pyarrow\hdfs.py", line 49, in __init__
    self._connect(host, port, user, kerb_ticket, extra_conf)
  File "pyarrow\_hdfsio.pyx", line 85, in pyarrow._hdfsio.HadoopFileSystem._connect
  File "pyarrow\error.pxi", line 114, in pyarrow.lib.check_status
OSError: Unable to load libjvm: �Ҳ���ָ����ģ�顣

and by the way, before I got this error,hdfs.py was modified to avoid another problem by add shell=True

  File "C:\Users\think\Desktop\python-SDK-green\python\lib\site-packages\pyarrow\hdfs.py", line 145, in _maybe_set_hadoop_classpath
    classpath = _hadoop_classpath_glob(hadoop_bin)
  File "C:\Users\think\Desktop\python-SDK-green\python\lib\site-packages\pyarrow\hdfs.py", line 172, in _hadoop_classpath_glob
    return subprocess.check_output(hadoop_classpath_args)
  File "subprocess.py", line 424, in check_output
    return run(*popenargs, stdout=PIPE, timeout=timeout, check=True,
  File "subprocess.py", line 505, in run
    with Popen(*popenargs, **kwargs) as process:
  File "subprocess.py", line 951, in __init__
    self._execute_child(args, executable, preexec_fn, close_fds,
  File "subprocess.py", line 1420, in _execute_child
    hp, ht, pid, tid = _winapi.CreateProcess(executable, args,
OSError: [WinError 193] %1 不是有效的 Win32 应用程序。

Thank you !!!

pchaoda commented 2 years ago
#ifdef _WIN32
  ARROW_ASSIGN_OR_RAISE(search_prefixes, MakeFilenameVector({""}));
  ARROW_ASSIGN_OR_RAISE(search_suffixes,
                        MakeFilenameVector({"/jre/bin/server", "/bin/server"}));
  file_name = "jvm.dll";

in /jre/bin/server directory exist jvm.dll OSError: Unable to load libjvm: �Ҳ���ָ����ģ�顣 and I can not tell what is the code after OSError: Unable to load libjvm:

pitrou commented 2 years ago

Sorry for the delay. I've filed https://issues.apache.org/jira/browse/ARROW-16617 for the unreadable error message.