big-data-europe / docker-hdfs-filebrowser

A docker image for HDFS FileBrowser. Cloudera Hue with FileBrowser only.
11 stars 11 forks source link

Access HDFS from Hue #8

Open zar3bski opened 5 years ago

zar3bski commented 5 years ago

I'm building a stack by adding bde2020/hdfs-filebrowser:3.11 to this docker compose. Works well, I added all the services in a common networks, pointed hue to the named node following this documentation but Hue does not seem able to access the user's folder in hdfs. Here is the error I get in Hue

HTTPConnectionPool(host='namenode', port=50070): Max retries exceeded with url: /webhdfs/v1/user/dav?op=GETFILESTATUS&user.name=hue&doas=dav (Caused by NewConnectionError('<requests.packages.urllib3.connection.HTTPConnection object at 0x7f5e2ac80810>: Failed to establish a new connection: [Errno 111] Connection refused',))

though the folder exists in hdfs with this owner. Here is how I created the user in the namenode:

useradd dav
passwd dav
hdfs dfs -mkdir /user/dav
hdfs dfs -chown dav:dav /user/dav

user's credentials are the same in filebrowser and namenode. Is my problem related to i) network issue, ii) credential or iii) something I completely missed. I tried to add the user to the hadoop group but the latter does not seem to exist

root@namenode:/# usermod -a -G hadoop dav            
usermod: group 'hadoop' does not exist

(creating the group does not change much)

  namenode:
    image: bde2020/hadoop-namenode:2.0.0-hadoop3.1.1-java8
    container_name: namenode
    hostname: namenode
    ports:
      - 9870:9870
    volumes:
      - hadoop_namenode:/hadoop/dfs/name
    env_file:
      - ./hadoop.env
    networks:
      - hadoop_net

  filebrowser: 
    container_name: hue
    image: bde2020/hdfs-filebrowser:3.11
    ports:
      - "8088:8088"
    environment:
      - NAMENODE_HOST=namenode
    networks:
      - hadoop_net

networks:
  hadoop_net: 
nag9s commented 4 years ago

@zar3bski , Did you happen to find a solution, i seem to stuck with the same problem

zar3bski commented 4 years ago

I do not recall all the details but I finally came with a functional design here: https://github.com/zar3bski/hadoop-sandbox. You can fork it if you want.

The problem came from the fact that the entrypoint using the var NAMENODE_HOST made the assumption that hdfs is exposed over port 50070 (which is not the case with newer hadoop cluster). I finally mounted an .ini to explicitly point to namenode (note that both containers need to be on the same network for DNS resolution to work)

  filebrowser: 
    container_name: hue
    image: gethue/hue:4.4.0
    ports:
      - "8000:8888"
    env_file:
      - ./hadoop.env
    volumes: 
      - ./overrides/hue/hue-overrides.ini:/usr/share/hue/desktop/conf/hue-overrides.ini
    depends_on:
      - namenode
      - resourcemanager
    networks:
      - hadoop

the .ini

[desktop]
  http_host=0.0.0.0
  http_port=8888
  time_zone=France
  dev=true
  app_blacklist=impala,zookeeper,oozie,hbase,security,search
[hadoop]
  [[hdfs_clusters]]
    [[[default]]]
      fs_defaultfs=hdfs://namenode:8020
      webhdfs_url=http://namenode:9870/webhdfs/v1
      security_enabled=false

  [[yarn_clusters]]
    [[[default]]]
      resourcemanager_api_url=http://resourcemanager:8088
      history_server_api_url=http://historyserver:19888

[beeswax]
  hive_server_host=hiveserver