awslabs / aws-glue-libs

AWS Glue Libraries are additions and enhancements to Spark for ETL operations.
Other
647 stars 304 forks source link

Official latest aws-glue-libs docker image with arm64 does not seem to work on Macbook Pro M2 specs #205

Open awongCM opened 8 months ago

awongCM commented 8 months ago

https://docs.aws.amazon.com/glue/latest/dg/aws-glue-programming-etl-libraries.html#develop-local-docker-image

I followed the instructions above to setup its docker image container from DockerHub link I did docker pull amazon/aws-glue-libs:glue_libs_4.0.0_image_01 and tried to start up in docker desktop. It abruptly exited.

At this point, I thought this image does not have arm64 based image layer running on it that may not be compatible for my Macbook Pro M2 machine.

So I tried docker pull amazon/aws-glue-libs:glue_libs_4.0.0_image_01-arm64. Tried starting up in docker desktop. It also abruptly exited too.

Now I'm confused.

Do any of these docker images ever work on Apple M Core Series machines at all since their inception less than 4 years ago?

Can anyone help to shed light on this?

FYI - I followed the exisiting request here - https://github.com/awslabs/aws-glue-libs/issues/83#issue-837963715 which allow me to raise this issue request.

svajiraya commented 1 month ago

@awongCM what is the docker run command you are using? are you seeing any errors? Can you please post the docker logs to this thread for me to investigate further?

docker run -it --rm -p 8888:8888 --name glue_pyspark public.ecr.aws/glue/aws-glue-libs:glue_libs_4.0.0_image_01

If the above command exits as you said, In another shell, run docker logs glue_pyspark

The images are built with multi-arch support for amd64 and arm64.

I tested the image on EC2 Graviton m6g instance type (uses arm64 CPU arch) and it seems to be working fine. It would be great if I can get some more details to investigate:

[glue_user@9c05327ab205 workspace]$ uname -r
5.10.223-212.873.amzn2.aarch64
[glue_user@9c05327ab205 workspace]$ pyspark
Python 3.10.2 (main, Oct  8 2024, 04:02:18) [GCC 7.3.1 20180712 (Red Hat 7.3.1-17)] on linux
Type "help", "copyright", "credits" or "license" for more information.
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/home/glue_user/spark/jars/log4j-slf4j-impl-2.17.2.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/home/glue_user/spark/jars/slf4j-reload4j-1.7.36.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/home/glue_user/aws-glue-libs/jars/log4j-slf4j-impl-2.17.2.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/home/glue_user/aws-glue-libs/jars/slf4j-reload4j-1.7.36.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.apache.logging.slf4j.Log4jLoggerFactory]
Setting default log level to "WARN".
To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel).
24/10/10 14:47:23 WARN Utils: Service 'SparkUI' could not bind on port 4040. Attempting port 4041.
Welcome to
      ____              __
     / __/__  ___ _____/ /__
    _\ \/ _ \/ _ `/ __/  '_/
   /__ / .__/\_,_/_/ /_/\_\   version 3.3.0-amzn-1
      /_/

Using Python version 3.10.2 (main, Oct  8 2024 04:02:18)
Spark context Web UI available at http://9c05327ab205:4041
Spark context available as 'sc' (master = local[*], app id = local-1728571643790).
SparkSession available as 'spark'.
>>> df = spark.createDataFrame([('X', )], "dummy STRING")
>>> df.printSchema()
root
 |-- dummy: string (nullable = true)

>>>