crs4 / pydoop

A Python MapReduce and HDFS API for Hadoop
Apache License 2.0
236 stars 59 forks source link

hdfs.copy() - FileNotFoundError: [Errno 2] No such file or directory #373

Open husseljoo opened 3 months ago

husseljoo commented 3 months ago

This is the script I'm running:

import pydoop.hdfs as hdfs

local_path = "/home/husseljo/sample-files/a.csv"
hdfs_path = "/some-dir/"
hdfs_handle = hdfs.hdfs(host="default", port=0, user=os.environ.get("HADOOP_USER_NAME", "some-username"))
hdfs_handle.copy(local_path, hdfs_handle, hdfs_path)

the error it produces:

2024-06-04 18:26:29,878 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
hdfsCopyImpl(src=/home/husseljo/sample-files/a.csv, dst=/some-dir/, deleteSource=0): FileUtil#copy error:
(unable to get stack trace for java.io.FileNotFoundException exception: ExceptionUtils::getStackTrace error.)
Traceback (most recent call last):
  File "/home/husseljo/sample-files/pydoop_test.py", line 8, in <module>
    hdfs_handle.copy(local_path, hdfs_handle, hdfs_path)
  File "/home/husseljo/.local/lib/python3.10/site-packages/pydoop/hdfs/fs.py", line 312, in copy
    return self.fs.copy(from_path, to_hdfs, to_path)
FileNotFoundError: [Errno 2] No such file or directory

for reference:

 ❯ hdfs dfs -ls /
2024-06-04 18:30:12,003 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Found 2 items
drwxr-xr-x   - root     supergroup          0 2024-03-12 11:22 /rmstate
drwxr-xr-x   - husseljo supergroup          0 2024-06-04 18:25 /some-dir

The above script and error pretty much describes the issue I'm facing. The file /home/husseljo/sample-files/a.csv exists in my the local filesystem, but it does not seem to be seen. Any idea how to tackle this?