Closed DAYceng closed 2 years ago
I found a solution
Containers that need to interact with hadoop can be connected to the docker network where hadoop is located
docker network connect docker-hadoop_default {your_containerName}
Just use the above command
docker network inspect docker-hadoop_default
Check if {your_containerName} is added to hadoop network
done.
I modified docker-compose.yml as suggested in issues98 and built a hadoop cluster.And I can use python to CRUD on HDFS on my server. The test code is as follows:
However, running the same code in a jupyter notebook container built on the same server gives an error I also used docker-compose.yml when building the jupyter notebook container, so jupyter and hadoop are in two different docker networks (I don't know if this is the cause of the error) The specific error is as follows:
The jupyter notebook is constructed as follows:
In general, I deploy docker-hadoop on the server, and I can use the server to do CRUD to hadoop through pyhdfs. But cannot upload/download files using other docker containers on the server (network) or local computer.
Combined with the error message, I guess the cause of the problem is that the process of using pyhdfs to read the HDFS path only needs to access the namenode, and the namenode can be accessed normally, so operations such as
.get_home_directory()
and.mkdirs("/data")
do not report errors.However, to achieve file reading and writing, it is necessary to access the datanode. The datanode cannot be accessed by the external network, so other containers or local computers cannot perform upload/download operations.
Because the configuration item
dfs.client.use.datanode.hostname=true
of hdfs-site.xml has been made inhadoop.env
file, I tried to modify the Dockerfile of datanode Added the following statement:ENV HDFS_CONF_dfs_client_use_datanode_hostname=true
but didn't workSo does anyone have the same problem? Please give me some ideas and help, thank you very much