awslabs / aws-glue-libs

AWS Glue Libraries are additions and enhancements to Spark for ETL operations.
Other
635 stars 300 forks source link

Having Trouble with amazon/aws-glue-libs:glue_libs_3.0.0_image_01 #151

Open ffitcht opened 1 year ago

ffitcht commented 1 year ago

I and a colleague cannot get the version 3.0.0_image_01 of aws-glue to run in docker. We are using DISABLE_SSL=true in the docker-compose file. The information we are getting from the exit is:

SSL Disabled /home/glue_user/jupyter/jupyter_start.sh: line 4: livy-server: command not found /home/glue_user/jupyter/jupyter_start.sh: line 10: jupyter: command not found

Can anyone help resolve this issue please?

moomindani commented 1 year ago

Can you share details so that we can understand the issue?

jurgler commented 1 year ago

Env: Windows 10 + WSL2 (Ubuntu 20.04 LTS) + Docker Desktop(4.12.0) + VSCode

I am using a customized image for development & test as following:

[Dockerfile]

FROM amazon/aws-glue-libs:glue_libs_3.0.0_image_01

\# Sync host users uid:gid with glue_user to prevent access-denied
ARG HOST_UID=10000
ARG HOST_GID=0
USER root
RUN usermod -u ${HOST_UID} glue_user
RUN usermod -g ${HOST_GID} glue_user
RUN chown -R glue_user /home/glue_user/.jupyter

USER glue_user

\# Set aws credentials
RUN mkdir -p /home/glue_user/.aws
COPY ./aws_credentials /home/glue_user/.aws/credentials
COPY ./aws_config /home/glue_user/.aws/config

\# Fix jupyter-notebook xsrf error
RUN /home/glue_user/.local/bin/jupyter lab --generate-config
RUN echo $' \n\
c.ServerApp.disable_check_xsrf = True \n\
' >> /home/glue_user/.jupyter/jupyter_lab_config.py

[docker-compose.yml]

version: "3.8"

services:
  app:
    container_name: glue-etl-jupyter-notebook
    build:
      context: .
      dockerfile: Dockerfile
    environment:
            - HOST_UID=$HOST_UID
            - HOST_GID=$HOST_GID
            - DISABLE_SSL=true
            - JUPYTERLAB_WORKSPACES_DIR=/home/glue_user/workspace/jupyter_workspace/
    networks:
      - internal
    deploy:
      resources:
        limits:
          cpus: 2
          memory: 4G
        reservations:
          memory: 100M
    volumes:
      - ./host_script_folder:/home/glue_user/workspace/jupyter_workspace/:rw
    privileged: true
    command: /home/glue_user/jupyter/jupyter_start.sh
    ports:
      - 4040:4040
      - 8888:8888
      - 18080:18080
    user: glue_user
networks:
  internal:
    name: glue-etl-dev

[.devcontainer/devcontainer.json] for VSCode Remote-Container

{
    "name": "tdm-etl-dev",
    "dockerComposeFile": [
        "../docker-compose.yml"
    ],
    "service": "app",
    "workspaceFolder": "/home/glue_user/workspace/jupyter_workspace/",
    "settings":  {
        "jupyter.jupyterServerType": "remote",
        "python.defaultInterpreterPath": "/usr/bin/python3",
        "python.analysis.extraPaths": [
            "/home/glue_user/aws-glue-libs/PyGlue.zip:/home/glue_user/spark/python/lib/py4j-0.10.9-src.zip:/home/glue_user/spark/python/"
        ]
    },
    "extensions": [
        "ms-python.python"
    ],
    "forwardPorts": [
        // Jupyter notebook
        8888,
        // Spark UI
        4040,
        // Spark Submit
        18080
    ],
    "remoteUser": "glue_user",
    "updateRemoteUserUID": false
}

I hope this helps~

RaulAlexandreMatos commented 1 year ago

Thanks jurgler for the complete answer. I had the same problem as ffitcht and the solution for me was just changing entrypoint to command. I don't know what happened, because yesterday I didn't have any errors.

My old docker-compose file:

version: '3.7'
services:
  jupyter:
    container_name: glue_jupyter
    entrypoint: /home/glue_user/jupyter/jupyter_start.sh
    environment:
      - DISABLE_SSL=true
    image: amazon/aws-glue-libs:glue_libs_3.0.0_image_01
    ports:
      - '4040:4040'
      - '18080:18080'
      - '8998:8998'
      - '8888:8888'
    restart: always
    volumes:
      - <path-to-.aws-folder>/.aws:/home/glue_user/.aws
      - <path-to-jupyter-workspace-folder>/glue_jupyter_workspace:/home/glue_user/workspace/jupyter_workspace/

My new docker-compose file:

version: '3.7'
services:
  jupyter:
    container_name: glue_jupyter
    command: /home/glue_user/jupyter/jupyter_start.sh
    environment:
      - DISABLE_SSL=true
    image: amazon/aws-glue-libs:glue_libs_3.0.0_image_01
    ports:
      - '4040:4040'
      - '18080:18080'
      - '8998:8998'
      - '8888:8888'
    restart: always
    volumes:
      - <path-to-.aws-folder>/.aws:/home/glue_user/.aws
      - <path-to-jupyter-workspace-folder>/glue_jupyter_workspace:/home/glue_user/workspace/jupyter_workspace/