Open luck02 opened 7 years ago
Botched dependencies? Hmm..
@ash211 @mccheah seen this?
garylucas@gary-lucas:~/dev/proj/infra/spark-2.2.0-k8s-0.4.0-SNAPSHOT-bin-2.7.3/jars|DAT-spark-on-k8s⚡
⇒ ls -alh | grep snappy
-rw-r--r-- 1 garylucas staff 48K Sep 16 20:54 snappy-0.2.jar
-rw-r--r-- 1 garylucas staff 1.0M Sep 16 20:54 snappy-java-1.1.2.6.jar
I think the snappy deps are there. I'll take a peek at the dockerfiles --edit-- yah, they copy the whole jars directory.
Can you post the result of kubectl describe pod <driver>
and kubectl describe pod <executor>
?
driver output
Name: kmderived-1506122359203-driver
Namespace: default
Node: ip-10-253-50-77.ec2.internal/10.253.50.77
Start Time: Fri, 22 Sep 2017 16:19:25 -0700
Labels: role=kissmetrics
spark-app-selector=spark-e65d6a8c895f47eead96caad62edfc7d
spark-role=driver
Annotations: podpreset.admission.kubernetes.io/podpreset-kissmetrics-env=9039918
spark-app-name=kmderived
Status: Running
IP: 10.2.106.14
Containers:
spark-kubernetes-driver:
Container ID: docker://872c49fbee70f53dc3a8163e89d0a80abe0a0fce6360223641359c3c7749b403
Image: ecr_id.dkr.ecr.us-east-1.amazonaws.com/spark-driver-py-kissmetrics:latest
Image ID: docker-pullable://ecr_id.dkr.ecr.us-east-1.amazonaws.com/spark-driver-py-kissmetrics@sha256:41d872998dbbb28882f176e164add56fd22d40a76d27381bb5f243b5c9bf1aa3
Port: <none>
State: Running
Started: Fri, 22 Sep 2017 16:19:37 -0700
Ready: True
Restart Count: 0
Limits:
memory: 15769Mi
Requests:
cpu: 8
memory: 14Gi
Environment:
SPARK_DRIVER_MEMORY: 14G
SPARK_DRIVER_CLASS: org.apache.spark.deploy.PythonRunner
SPARK_DRIVER_ARGS:
SPARK_MOUNTED_CLASSPATH: /opt/jars/hadoop-aws-2.7.3.jar:/opt/jars/aws-java-sdk-1.7.4.jar:/opt/jars/joda-time-2.9.3.jar:/opt/jars/mysql-connector-java-5.1.35.jar:/opt/jars/snappy-java-1.1.4.jar
PYSPARK_PRIMARY: /opt/ub-data-kissmetrics/jobs/kissmetrics_warehouse/Derived_NonAppPageViewed/Compute__Derived_NonAppPageViewed.py
PYSPARK_FILES: /opt/ub-data-kissmetrics/jobs/kissmetrics_warehouse/Derived_NonAppPageViewed/Compute__Derived_NonAppPageViewed.py
SPARK_JAVA_OPT_0: -Dspark.kubernetes.docker.image.pullPolicy=Always
SPARK_JAVA_OPT_1: -Dspark.kubernetes.namespace=default
SPARK_JAVA_OPT_2: -Dspark.app.id=spark-e65d6a8c895f47eead96caad62edfc7d
SPARK_JAVA_OPT_3: -Dspark.kubernetes.driver.docker.image=ecr_id.dkr.ecr.us-east-1.amazonaws.com/spark-driver-py-kissmetrics:latest
SPARK_JAVA_OPT_4: -Dspark.submit.deployMode=cluster
SPARK_JAVA_OPT_5: -Dspark.driver.blockManager.port=7079
SPARK_JAVA_OPT_6: -Dspark.app.name=kmderived
SPARK_JAVA_OPT_7: -Dspark.driver.bindAddress=kmderived-1506122359203-driver-svc.default.svc.cluster.local
SPARK_JAVA_OPT_8: -Dspark.driver.host=kmderived-1506122359203-driver-svc.default.svc.cluster.local
SPARK_JAVA_OPT_9: -Dspark.kubernetes.authenticate.driver.serviceAccountName=spark-k8s-service-account
SPARK_JAVA_OPT_10: -Dspark.driver.memory=14G
SPARK_JAVA_OPT_11: -Dspark.driver.port=7078
SPARK_JAVA_OPT_12: -Dspark.driver.cores=8
SPARK_JAVA_OPT_13: -Dspark.kubernetes.executor.labels=role=kissmetrics
SPARK_JAVA_OPT_14: -Dspark.executor.memory=20G
SPARK_JAVA_OPT_15: -Dspark.kubernetes.executor.podNamePrefix=kmderived-1506122359203
SPARK_JAVA_OPT_16: -Dspark.executor.instances=10
SPARK_JAVA_OPT_17: -Dspark.executor.cores=6
SPARK_JAVA_OPT_18: -Dspark.kubernetes.driver.labels=role=kissmetrics
SPARK_JAVA_OPT_19: -Dspark.files=/opt/ub-data-kissmetrics/jobs/kissmetrics_warehouse/Derived_NonAppPageViewed/Compute__Derived_NonAppPageViewed.py,/opt/ub-data-kissmetrics/jobs/kissmetrics_warehouse/Derived_NonAppPageViewed/Compute__Derived_NonAppPageViewed.py
SPARK_JAVA_OPT_20: -Dspark.master=k8s://http://127.0.0.1:8001
SPARK_JAVA_OPT_21: -Dspark.kubernetes.initcontainer.docker.image=ecr_id.dkr.ecr.us-east-1.amazonaws.com/spark-init-kissmetrics:latest
SPARK_JAVA_OPT_22: -Dspark.kubernetes.executor.docker.image=ecr_id.dkr.ecr.us-east-1.amazonaws.com/spark-executor-py-kissmetrics:latest
SPARK_JAVA_OPT_23: -Dspark.jars=/opt/jars/hadoop-aws-2.7.3.jar,/opt/jars/aws-java-sdk-1.7.4.jar,/opt/jars/joda-time-2.9.3.jar,/opt/jars/mysql-connector-java-5.1.35.jar,/opt/jars/snappy-java-1.1.4.jar
SPARK_JAVA_OPT_24: -Dspark.kubernetes.driver.pod.name=kmderived-1506122359203-driver
REPORTSDB_HOSTNAME: reports-db.unbounce.net
REPORTSDB_PORT: 3306
REPORTSDB_KISSMETRICS_INBOUND_WAREHOUSE_NAME: kissmetrics_inbound
KM_MANIFEST_FILE: s3a://manifest/index.csv
REPORTSDB_KISSMETRICS_WAREHOUSE_NAME: kissmetrics_warehouse
MAX_FILES_TO_CONSUME: 250
OUTPUT_LOCATION: s3a://output
AWS_DATA_S3_ACCESS_KEY: <set to the key 'AWS_DATA_S3_ACCESS_KEY' in secret 'kissmetrics-k8s'> Optional: false
AWS_DATA_S3_SECRET_ACCESS_KEY: <set to the key 'AWS_DATA_S3_SECRET_ACCESS_KEY' in secret 'kissmetrics-k8s'> Optional: false
REPORTSDB_KISSMETRICS_INBOUND_WRITE_JDBCURL: <set to the key 'REPORTSDB_KISSMETRICS_INBOUND_WRITE_JDBCURL' in secret 'kissmetrics-k8s'> Optional: false
REPORTSDB_KISSMETRICS_INBOUND_READ_JDBCURL: <set to the key 'REPORTSDB_KISSMETRICS_INBOUND_READ_JDBCURL' in secret 'kissmetrics-k8s'> Optional: false
REPORTSDB_KISSMETRICS_INBOUND_WRITE_USERNAME: <set to the key 'REPORTSDB_KISSMETRICS_INBOUND_WRITE_USERNAME' in secret 'kissmetrics-k8s'> Optional: false
REPORTSDB_KISSMETRICS_INBOUND_WRITE_PASSWORD: <set to the key 'REPORTSDB_KISSMETRICS_INBOUND_WRITE_PASSWORD' in secret 'kissmetrics-k8s'> Optional: false
REPORTSDB_KISSMETRICS_WAREHOUSE_WRITE_USERNAME: <set to the key 'REPORTSDB_KISSMETRICS_WAREHOUSE_WRITE_USERNAME' in secret 'kissmetrics-k8s'> Optional: false
REPORTSDB_KISSMETRICS_WAREHOUSE_WRITE_PASSWORD: <set to the key 'REPORTSDB_KISSMETRICS_WAREHOUSE_WRITE_PASSWORD' in secret 'kissmetrics-k8s'> Optional: false
REPORTSDB_PROD_APPLICATIONDBUSER_WRITE_JDBCURL: <set to the key 'REPORTSDB_PROD_APPLICATIONDBUSER_WRITE_JDBCURL' in secret 'kissmetrics-k8s'> Optional: false
REPORTSDB_KISSMETRICS_WAREHOUSE_READ_JDBCURL: <set to the key 'REPORTSDB_PROD_APPLICATIONDBUSER_WRITE_JDBCURL' in secret 'kissmetrics-k8s'> Optional: false
REPORTSDB_PROD_APPLICATIONDBUSER_READ_JDBCURL: <set to the key 'REPORTSDB_PROD_APPLICATIONDBUSER_READ_JDBCURL' in secret 'kissmetrics-k8s'> Optional: false
Mounts:
/var/run/secrets/kubernetes.io/serviceaccount from spark-k8s-service-account-token-fn4qg (ro)
Conditions:
Type Status
Initialized True
Ready True
PodScheduled True
Volumes:
spark-k8s-service-account-token-fn4qg:
Type: Secret (a volume populated by a Secret)
SecretName: spark-k8s-service-account-token-fn4qg
Optional: false
QoS Class: Burstable
Node-Selectors: <none>
Tolerations: <none>
Events:
FirstSeen LastSeen Count From SubObjectPath Type Reason Message
--------- -------- ----- ---- ------------- -------- ------ -------
47s 47s 1 default-scheduler Normal Scheduled Successfully assigned kmderived-1506122359203-driver to ip-10-253-50-77.ec2.internal
47s 47s 1 kubelet, ip-10-253-50-77.ec2.internal Normal SuccessfulMountVolume MountVolume.SetUp succeeded for volume "spark-k8s-service-account-token-fn4qg"
46s 46s 1 kubelet, ip-10-253-50-77.ec2.internal spec.containers{spark-kubernetes-driver} Normal Pulling pulling image "ecr_id.dkr.ecr.us-east-1.amazonaws.com/spark-driver-py-kissmetrics:latest"
35s 35s 1 kubelet, ip-10-253-50-77.ec2.internal spec.containers{spark-kubernetes-driver} Normal Pulled Successfully pulled image "ecr_id.dkr.ecr.us-east-1.amazonaws.com/spark-driver-py-kissmetrics:latest"
35s 35s 1 kubelet, ip-10-253-50-77.ec2.internal spec.containers{spark-kubernetes-driver} Normal Created Created container
35s 35s 1 kubelet, ip-10-253-50-77.ec2.internal spec.containers{spark-kubernetes-driver} Normal Started Started container
executor:
Name: kmderived-1506122359203-exec-1
Namespace: default
Node: ip-10-253-46-101.ec2.internal/10.253.46.101
Start Time: Fri, 22 Sep 2017 16:19:41 -0700
Labels: role=kissmetrics
spark-app-selector=spark-e65d6a8c895f47eead96caad62edfc7d
spark-exec-id=1
spark-role=executor
Annotations: podpreset.admission.kubernetes.io/podpreset-kissmetrics-env=9039918
Status: Running
IP: 10.2.36.12
Controlled By: Pod/kmderived-1506122359203-driver
Containers:
executor:
Container ID: docker://67035573284ccc7cb11808afbc66df0ea28923222e8d1a77326d40669a410f4c
Image: ecr_id.dkr.ecr.us-east-1.amazonaws.com/spark-executor-py-kissmetrics:latest
Image ID: docker-pullable://ecr_id.dkr.ecr.us-east-1.amazonaws.com/spark-executor-py-kissmetrics@sha256:9c1515b8c5150717835d5405e5f052163ec453ed56d865dae354e11dd83b542f
Ports: 10000/TCP, 7079/TCP
State: Running
Started: Fri, 22 Sep 2017 16:19:42 -0700
Ready: True
Restart Count: 0
Limits:
memory: 22Gi
Requests:
cpu: 6
memory: 20Gi
Environment:
SPARK_EXECUTOR_PORT: 10000
SPARK_DRIVER_URL: spark://CoarseGrainedScheduler@kmderived-1506122359203-driver-svc.default.svc.cluster.local:7078
SPARK_EXECUTOR_CORES: 6
SPARK_EXECUTOR_MEMORY: 20G
SPARK_APPLICATION_ID: spark-e65d6a8c895f47eead96caad62edfc7d
SPARK_EXECUTOR_ID: 1
SPARK_MOUNTED_CLASSPATH: /var/spark-data/spark-jars/*
SPARK_EXECUTOR_POD_IP: (v1:status.podIP)
REPORTSDB_HOSTNAME: reports-db.unbounce.net
REPORTSDB_PORT: 3306
REPORTSDB_KISSMETRICS_INBOUND_WAREHOUSE_NAME: kissmetrics_inbound
KM_MANIFEST_FILE: s3a://manfiest/index.csv
REPORTSDB_KISSMETRICS_WAREHOUSE_NAME: kissmetrics_warehouse
MAX_FILES_TO_CONSUME: 250
OUTPUT_LOCATION: s3a://output
AWS_DATA_S3_ACCESS_KEY: <set to the key 'AWS_DATA_S3_ACCESS_KEY' in secret 'kissmetrics-k8s'> Optional: false
AWS_DATA_S3_SECRET_ACCESS_KEY: <set to the key 'AWS_DATA_S3_SECRET_ACCESS_KEY' in secret 'kissmetrics-k8s'> Optional: false
REPORTSDB_KISSMETRICS_INBOUND_WRITE_JDBCURL: <set to the key 'REPORTSDB_KISSMETRICS_INBOUND_WRITE_JDBCURL' in secret 'kissmetrics-k8s'> Optional: false
REPORTSDB_KISSMETRICS_INBOUND_READ_JDBCURL: <set to the key 'REPORTSDB_KISSMETRICS_INBOUND_READ_JDBCURL' in secret 'kissmetrics-k8s'> Optional: false
REPORTSDB_KISSMETRICS_INBOUND_WRITE_USERNAME: <set to the key 'REPORTSDB_KISSMETRICS_INBOUND_WRITE_USERNAME' in secret 'kissmetrics-k8s'> Optional: false
REPORTSDB_KISSMETRICS_INBOUND_WRITE_PASSWORD: <set to the key 'REPORTSDB_KISSMETRICS_INBOUND_WRITE_PASSWORD' in secret 'kissmetrics-k8s'> Optional: false
REPORTSDB_KISSMETRICS_WAREHOUSE_WRITE_USERNAME: <set to the key 'REPORTSDB_KISSMETRICS_WAREHOUSE_WRITE_USERNAME' in secret 'kissmetrics-k8s'> Optional: false
REPORTSDB_KISSMETRICS_WAREHOUSE_WRITE_PASSWORD: <set to the key 'REPORTSDB_KISSMETRICS_WAREHOUSE_WRITE_PASSWORD' in secret 'kissmetrics-k8s'> Optional: false
REPORTSDB_PROD_APPLICATIONDBUSER_WRITE_JDBCURL: <set to the key 'REPORTSDB_PROD_APPLICATIONDBUSER_WRITE_JDBCURL' in secret 'kissmetrics-k8s'> Optional: false
REPORTSDB_KISSMETRICS_WAREHOUSE_READ_JDBCURL: <set to the key 'REPORTSDB_PROD_APPLICATIONDBUSER_WRITE_JDBCURL' in secret 'kissmetrics-k8s'> Optional: false
REPORTSDB_PROD_APPLICATIONDBUSER_READ_JDBCURL: <set to the key 'REPORTSDB_PROD_APPLICATIONDBUSER_READ_JDBCURL' in secret 'kissmetrics-k8s'> Optional: false
Mounts:
/var/run/secrets/kubernetes.io/serviceaccount from default-token-00b84 (ro)
Conditions:
Type Status
Initialized True
Ready True
PodScheduled True
Volumes:
default-token-00b84:
Type: Secret (a volume populated by a Secret)
SecretName: default-token-00b84
Optional: false
QoS Class: Burstable
Node-Selectors: <none>
Tolerations: <none>
Events:
FirstSeen LastSeen Count From SubObjectPath Type Reason Message
--------- -------- ----- ---- ------------- -------- ------ -------
50s 50s 1 default-scheduler Normal Scheduled Successfully assigned kmderived-1506122359203-exec-1 to ip-10-253-46-101.ec2.internal
50s 50s 1 kubelet, ip-10-253-46-101.ec2.internal Normal SuccessfulMountVolume MountVolume.SetUp succeeded for volume "default-token-00b84"
49s 49s 1 kubelet, ip-10-253-46-101.ec2.internal spec.containers{executor} Normal Pulling pulling image "ecr_id.dkr.ecr.us-east-1.amazonaws.com/spark-executor-py-kissmetrics:latest"
49s 49s 1 kubelet, ip-10-253-46-101.ec2.internal spec.containers{executor} Normal Pulled Successfully pulled image "ecr_id.dkr.ecr.us-east-1.amazonaws.com/spark-executor-py-kissmetrics:latest"
49s 49s 1 kubelet, ip-10-253-46-101.ec2.internal spec.containers{executor} Normal Created Created container
49s 49s 1 kubelet, ip-10-253-46-101.ec2.internal spec.containers{executor} Normal Started Started container
Hmm I haven't seen this, and we've been reading/writing parquet with Spark in k8s. Have you confirmed that class org.xerial.snappy.Snappy
is in that snappy-java jar?
@ash211 you're right it's not there.
META-INF/MANIFEST.MF META-INF/maven/org.iq80.snappy/snappy/pom.properties META-INF/maven/org.iq80.snappy/snappy/pom.xml org/iq80/snappy/BufferRecycler.class org/iq80/snappy/CorruptionException.class org/iq80/snappy/Crc32C.class org/iq80/snappy/HadoopSnappyCodec$SnappyCompressionInputStream.class org/iq80/snappy/HadoopSnappyCodec$SnappyCompressionOutputStream.class org/iq80/snappy/HadoopSnappyCodec.class org/iq80/snappy/Main.class org/iq80/snappy/Memory.class org/iq80/snappy/SlowMemory.class org/iq80/snappy/Snappy.class org/iq80/snappy/SnappyCompressor.class org/iq80/snappy/SnappyDecompressor.class org/iq80/snappy/SnappyInputStream.class org/iq80/snappy/SnappyInternalUtils.class org/iq80/snappy/SnappyOutputStream.class org/iq80/snappy/UnsafeMemory.class
Missing / broken snappy jar.
I built this off off the tip of 2.2 branch
garylucas@gary-lucas:~/dev/spark|branch-2.2-kubernetes⚡
⇒ git rev-parse HEAD
8a0f485d83f84a9b556f7634fed8cb1b65e12159
So could be my build command was incorrect or there could be some issue in the branch.
Here was my build command:
garylucas@gary-lucas:~/dev/spark|branch-2.2-kubernetes⚡
⇒ git rev-parse HEAD
8a0f485d83f84a9b556f7634fed8cb1b65e12159
garylucas@gary-lucas:~/dev/spark|branch-2.2-kubernetes⚡
⇒ ./dev/make-distribution.sh --pip --tgz -Pmesos -Pyarn -Pkinesis-asl -Phive -Phive-thriftserver -Pkubernetes -Phadoop-2.7 -Dhadoop.version=2.7.3
I got that from: https://github.com/apache-spark-on-k8s/spark/issues/438
The reason I'm building my own is listed here: https://github.com/apache-spark-on-k8s/spark/issues/494
The other issue i'm experiencing on this build is: https://github.com/apache-spark-on-k8s/spark/issues/505
Which i'm currently trying to debug.
Sorry, I didn't realize there were two snappy jars in there and inspected the wrong one (IE snappy0.2.jar)
This appears to be correct: snappy-java-1.1.2.6.jar
META-INF/MANIFEST.MF org/xerial/snappy/OSInfo.class org/xerial/snappy/PureJavaCrc32C.class org/xerial/snappy/Snappy.class org/xerial/snappy/SnappyBundleActivator.class org/xerial/snappy/SnappyCodec.class org/xerial/snappy/SnappyError.class org/xerial/snappy/SnappyErrorCode.class org/xerial/snappy/SnappyException.class org/xerial/snappy/SnappyFramed.class org/xerial/snappy/SnappyFramedInputStream$FrameAction.class org/xerial/snappy/SnappyFramedInputStream$FrameData.class org/xerial/snappy/SnappyFramedInputStream$FrameMetaData.class org/xerial/snappy/SnappyFramedInputStream.class org/xerial/snappy/SnappyFramedOutputStream.class org/xerial/snappy/SnappyIOException.class org/xerial/snappy/SnappyInputStream.class org/xerial/snappy/SnappyLoader.class org/xerial/snappy/SnappyNative.class org/xerial/snappy/SnappyOutputStream.class org/xerial/snappy/VERSION org/xerial/snappy/buffer/BufferAllocator.class org/xerial/snappy/buffer/BufferAllocatorFactory.class org/xerial/snappy/buffer/CachedBufferAllocator$1.class org/xerial/snappy/buffer/CachedBufferAllocator.class org/xerial/snappy/buffer/DefaultBufferAllocator$1.class org/xerial/snappy/buffer/DefaultBufferAllocator.class org/xerial/snappy/native/AIX/ppc/libsnappyjava.a org/xerial/snappy/native/AIX/ppc64/libsnappyjava.a org/xerial/snappy/native/Linux/aarch64/libsnappyjava.so org/xerial/snappy/native/Linux/arm/libsnappyjava.so org/xerial/snappy/native/Linux/armhf/libsnappyjava.so org/xerial/snappy/native/Linux/ppc64/libsnappyjava.so org/xerial/snappy/native/Linux/ppc64le/libsnappyjava.so org/xerial/snappy/native/Linux/s390x/libsnappyjava.so org/xerial/snappy/native/Linux/x86/libsnappyjava.so org/xerial/snappy/native/Linux/x86_64/libsnappyjava.so org/xerial/snappy/native/Mac/x86/libsnappyjava.jnilib org/xerial/snappy/native/Mac/x86_64/libsnappyjava.jnilib org/xerial/snappy/native/README org/xerial/snappy/native/SunOS/sparc/libsnappyjava.so org/xerial/snappy/native/SunOS/x86/libsnappyjava.so org/xerial/snappy/native/SunOS/x86_64/libsnappyjava.so org/xerial/snappy/native/Windows/x86/snappyjava.dll org/xerial/snappy/native/Windows/x86_64/snappyjava.dll
Looks like everything is present on the executors:
garylucas@gary-lucas:~/dev/ub-data-kissmetrics|DAT-spark-on-k8s⚡ ⇒ kubectl exec -it kmderived-1506306351957-exec-3 /bin/bash bash-4.3# ls Compute__Derived_NonAppPageViewed.py hadoop-aws-2.7.3.jar mysql-connector-java-5.1.35.jar aws-java-sdk-1.7.4.jar joda-time-2.9.3.jar snappy-java-1.1.2.6.jar bash-4.3# cd /opt/spark/ RELEASE/ bin/ conf/ examples/ jars/ python/ sbin/ work-dir/ bash-4.3# cd /opt/spark/jars/ bash-4.3# ls -alh | grep snappy -rw-r--r-- 18 root root 47.6K Sep 17 03:54 snappy-0.2.jar -rw-r--r-- 18 root root 1.0M Sep 17 03:54 snappy-java-1.1.2.6.jar bash-4.3#
Huh, also when I expose the spark UI I see this in the 'environment variables':
/opt/spark/jars/snappy-0.2.jar | System Classpath
/opt/spark/jars/snappy-java-1.1.2.6.jar | System Classpath
I upgraded to the recent build (https://github.com/apache-spark-on-k8s/spark/releases/tag/v2.2.0-kubernetes-0.4.0) in the hopes that my personal build was somehow hooped.
Same results:
3.535 s due to Job aborted due to stage failure: Task 28 in stage 3.0 failed 4 times, most recent failure: Lost task 28.3 in stage 3.0 (TID 363, 10.2.88.8, executor 13): org.apache.spark.SparkException: Task failed while writing rows
at org.apache.spark.sql.execution.datasources.FileFormatWriter$.org$apache$spark$sql$execution$datasources$FileFormatWriter$$executeTask(FileFormatWriter.scala:272)
at org.apache.spark.sql.execution.datasources.FileFormatWriter$$anonfun$write$1$$anonfun$apply$mcV$sp$1.apply(FileFormatWriter.scala:191)
at org.apache.spark.sql.execution.datasources.FileFormatWriter$$anonfun$write$1$$anonfun$apply$mcV$sp$1.apply(FileFormatWriter.scala:190)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:87)
at org.apache.spark.scheduler.Task.run(Task.scala:108)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:335)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:748)
Caused by: java.lang.NoClassDefFoundError: Could not initialize class org.xerial.snappy.Snappy
at org.apache.parquet.hadoop.codec.SnappyCompressor.compress(SnappyCompressor.java:67)
at org.apache.hadoop.io.compress.CompressorStream.compress(CompressorStream.java:81)
at org.apache.hadoop.io.compress.CompressorStream.finish(CompressorStream.java:92)
at org.apache.parquet.hadoop.CodecFactory$BytesCompressor.compress(CodecFactory.java:112)
at org.apache.parquet.hadoop.ColumnChunkPageWriteStore$ColumnChunkPageWriter.writePage(ColumnChunkPageWriteStore.java:93)
at org.apache.parquet.column.impl.ColumnWriterV1.writePage(ColumnWriterV1.java:150)
at org.apache.parquet.column.impl.ColumnWriterV1.accountForValueWritten(ColumnWriterV1.java:109)
at org.apache.parquet.column.impl.ColumnWriterV1.write(ColumnWriterV1.java:203)
at
I'm just taking some time to go through the executor logs but that'll take some time. Wondering if anyone else has the same issue.
What's the nature of the other libraries on the classpath? Are there any jars with shaded dependencies or large jars that contain library classes like Snappy?
Can you try running a Java or Scala job instead and see if there's similar behavior?
Good thoughts, here's my jars:
--jars
local:/opt/jars/hadoop-aws-2.7.3.jar,
local:/opt/jars/aws-java-sdk-1.7.4.jar,
local:/opt/jars/joda-time-2.9.3.jar,
local:/opt/jars/mysql-connector-java-5.1.35.jar \
I'll introspect them later this eve and also test with a simple raw scala job.
Ok, so no mention of snappy in those jars.
I submitted a simple scala based spark job that consumes a parquet from S3 that I know is valid, samples it down to 10% of original and writes it back to a new parquet
I included all the jars I've been using, mentioned in above comment (however I'm not running any queries against mysql or using joda-time).
No sign of similar problems to what I'm seeing above in pyspark. Read / write works
Next experiment, port sample job over to pyspark and repeat.
So this just gets weirder and weirder.
Both test jobs scala and pyspark work. But my actual original job still fails. At this stage my theories are:
Here's my spark-submit script:
#!/usr/bin/env bash
PATH_TO_DRIVER="$PWD/jobs/kissmetrics_warehouse/Derived_NonAppPageViewed/Compute__Derived_NonAppPageViewed.py"
./infra/spark-2.2.0-k8s-0.4.0-bin-2.7.3/bin/spark-submit \
--verbose \
--deploy-mode cluster \
--master k8s://http://127.0.0.1:8001 \
--jars file://"$PWD/jars/hadoop-aws-2.7.3.jar",file://"$PWD/jars/aws-java-sdk-1.7.4.jar",file://"$PWD/jars/joda-time-2.9.3.jar",file://"$PWD/jars/mysql-connector-java-5.1.35.jar"\
--kubernetes-namespace default \
--conf spark.kubernetes.authenticate.driver.serviceAccountName=spark-k8s-service-account \
--conf spark.kubernetes.docker.image.pullPolicy=IfNotPresent \
--conf spark.driver.memory=8G \
--conf spark.driver.cores=4 \
--conf spark.executor.instances=10 \
--conf spark.executor.cores=4\
--conf spark.executor.memory=8G \
--conf spark.app.name=pytest-parquet \
--conf spark.kubernetes.driver.label.role=kissmetrics \
--conf spark.kubernetes.executor.label.role=kissmetrics \
--conf spark.kubernetes.driver.docker.image=kubespark/spark-driver-py:v2.2.0-kubernetes-0.4.0 \
--conf spark.kubernetes.executor.docker.image=kubespark/spark-executor-py:v2.2.0-kubernetes-0.4.0 \
--conf spark.kubernetes.initcontainer.docker.image=kubespark/spark-init:v2.2.0-kubernetes-0.4.0 \
--conf spark.kubernetes.resourceStagingServer.internal.uri=http://spark-resource-staging-service:10000\
--conf spark.kubernetes.resourceStagingServer.uri=http://rss-url-numbers.us-east-1.elb.amazonaws.com:10000 \
file://$PATH_TO_DRIVER
The test scripts consumed a small-ish parquet file, sampled it down and produced a smaller file. This is trying to write about 10GB it looks like, IE not tiny but hardly huge.
Next I'm probably going to distill it down to a minimal problem and run that. Right now it takes awhile to iterate. I'll likely replace the current script with: 1: Hit Mysql db and get a bunch of rows. 2: Do some simple aggregation and filtering 3: Write to parquet.
That's essentially what I'm doing now but a 5-10 minute turnaround to fail time is a bit slow, however on the bright side I'm now using base images and don't have to build and push to ECR every darn time.
Error still seems to boil down too:
Suppressed: java.lang.NoClassDefFoundError: Could not initialize class org.xerial.snappy.Snappy
Here's a subset of the logs I'm seeing (The whole trace is huge)
Oct 01 22:32:47 data-cluster-slow-test k8s.pytest-parquet-1506921643054-exec-8_default_executor: 2017-10-02 05:32:47 INFO InternalParquetRecordWriter:160 - Flushing mem columnStore to file. allocated memory: 2172814
Oct 01 22:32:47 data-cluster-slow-test k8s.pytest-parquet-1506921643054-exec-7_default_executor: } ]
Oct 01 22:32:47 data-cluster-slow-test k8s.pytest-parquet-1506921643054-exec-7_default_executor: }
Oct 01 22:32:47 data-cluster-slow-test k8s.pytest-parquet-1506921643054-exec-7_default_executor: and corresponding Parquet message type:
Oct 01 22:32:47 data-cluster-slow-test k8s.pytest-parquet-1506921643054-exec-7_default_executor: message spark_schema {
Oct 01 22:32:47 data-cluster-slow-test k8s.pytest-parquet-1506921643054-exec-7_default_executor: required binary Fact_PageViewedEventId (UTF8);
Oct 01 22:32:47 data-cluster-slow-test k8s.pytest-parquet-1506921643054-exec-8_default_executor: 2017-10-02 05:32:47 ERROR Utils:91 - Aborting task
Oct 01 22:32:47 data-cluster-slow-test k8s.pytest-parquet-1506921643054-exec-8_default_executor: java.lang.NoClassDefFoundError: Could not initialize class org.xerial.snappy.Snappy
Oct 01 22:32:47 data-cluster-slow-test k8s.pytest-parquet-1506921643054-exec-8_default_executor: at org.apache.parquet.hadoop.codec.SnappyCompressor.compress(SnappyCompressor.java:67)
Oct 01 22:32:47 data-cluster-slow-test k8s.pytest-parquet-1506921643054-exec-8_default_executor: at org.apache.hadoop.io.compress.CompressorStream.compress(CompressorStream.java:81)
Oct 01 22:32:47 data-cluster-slow-test k8s.pytest-parquet-1506921643054-exec-5_default_executor: 2017-10-02 05:32:47 INFO FileOutputCommitter:108 - File Output Committer Algorithm version is 1
Oct 01 22:32:47 data-cluster-slow-test k8s.pytest-parquet-1506921643054-exec-5_default_executor: 2017-10-02 05:32:47 INFO SQLHadoopMapReduceCommitProtocol:54 - Using user defined output committer class org.apache.parquet.hadoop.ParquetOutputCommitter
Oct 01 22:32:47 data-cluster-slow-test k8s.pytest-parquet-1506921643054-exec-5_default_executor: 2017-10-02 05:32:47 INFO FileOutputCommitter:108 - File Output Committer Algorithm version is 1
Oct 01 22:32:47 data-cluster-slow-test k8s.pytest-parquet-1506921643054-exec-5_default_executor: 2017-10-02 05:32:47 INFO SQLHadoopMapReduceCommitProtocol:54 - Using output committer class org.apache.parquet.hadoop.ParquetOutputCommitter
Oct 01 22:32:47 data-cluster-slow-test k8s.pytest-parquet-1506921643054-exec-5_default_executor: 2017-10-02 05:32:47 INFO CodecConfig:95 - Compression: SNAPPY
Oct 01 22:32:47 data-cluster-slow-test k8s.pytest-parquet-1506921643054-exec-5_default_executor: 2017-10-02 05:32:47 INFO CodecConfig:95 - Compression: SNAPPY
Oct 01 22:32:47 data-cluster-slow-test k8s.pytest-parquet-1506921643054-exec-5_default_executor: 2017-10-02 05:32:47 INFO ParquetOutputFormat:329 - Parquet block size to 134217728
Oct 01 22:32:47 data-cluster-slow-test k8s.pytest-parquet-1506921643054-exec-5_default_executor: 2017-10-02 05:32:47 INFO ParquetOutputFormat:330 - Parquet page size to 1048576
Oct 01 22:32:47 data-cluster-slow-test k8s.pytest-parquet-1506921643054-exec-5_default_executor: 2017-10-02 05:32:47 INFO ParquetOutputFormat:331 - Parquet dictionary page size to 1048576
Oct 01 22:32:47 data-cluster-slow-test k8s.pytest-parquet-1506921643054-exec-5_default_executor: 2017-10-02 05:32:47 INFO ParquetOutputFormat:332 - Dictionary is on
Oct 01 22:32:47 data-cluster-slow-test k8s.pytest-parquet-1506921643054-exec-5_default_executor: 2017-10-02 05:32:47 INFO ParquetOutputFormat:333 - Validation is off
Oct 01 22:32:47 data-cluster-slow-test k8s.pytest-parquet-1506921643054-exec-5_default_executor: 2017-10-02 05:32:47 INFO ParquetOutputFormat:334 - Writer version is: PARQUET_1_0
Oct 01 22:32:47 data-cluster-slow-test k8s.pytest-parquet-1506921643054-exec-5_default_executor: 2017-10-02 05:32:47 INFO ParquetOutputFormat:335 - Maximum row group padding size is 0 bytes
Oct 01 22:32:47 data-cluster-slow-test k8s.pytest-parquet-1506921643054-exec-5_default_executor: 2017-10-02 05:32:47 INFO ParquetOutputFormat:336 - Page size checking is: estimated
Oct 01 22:32:47 data-cluster-slow-test k8s.pytest-parquet-1506921643054-exec-5_default_executor: 2017-10-02 05:32:47 INFO ParquetOutputFormat:337 - Min row count for page size check is: 100
Oct 01 22:32:47 data-cluster-slow-test k8s.pytest-parquet-1506921643054-exec-5_default_executor: 2017-10-02 05:32:47 INFO ParquetOutputFormat:338 - Max row count for page size check is: 10000
Oct 01 22:32:47 data-cluster-slow-test k8s.pytest-parquet-1506921643054-exec-5_default_executor: 2017-10-02 05:32:47 INFO ParquetWriteSupport:54 - Initialized Parquet WriteSupport with Catalyst schema:
Oct 01 22:32:47 : at org.apache.parquet.hadoop.ColumnChunkPageWriteStore$ColumnChunkPageWriter.writePage(ColumnChunkPageWriteStore.java:93)
Oct 01 22:32:47 : at org.apache.parquet.column.impl.ColumnWriterV1.writePage(ColumnWriterV1.java:150)
Oct 01 22:32:47 : at org.apache.parquet.column.impl.ColumnWriterV1.accountForValueWritten(ColumnWriterV1.java:109)
Oct 01 22:32:47 : at org.apache.parquet.column.impl.ColumnWriterV1.write(ColumnWriterV1.java:203)
Oct 01 22:32:47 : at org.apache.parquet.io.MessageColumnIO$MessageColumnIORecordConsumer.addBinary(MessageColumnIO.java:467)
Oct 01 22:32:47 : at org.apache.spark.sql.execution.datasources.parquet.ParquetWriteSupport$$anonfun$org$apache$spark$sql$execution$datasources$parquet$ParquetWriteSupport$$makeWriter$8.apply(ParquetWriteSupport.scala:163)
Oct 01 22:32:47 : at org.apache.spark.sql.execution.datasources.parquet.ParquetWriteSupport$$anonfun$org$apache$spark$sql$execution$datasources$parquet$ParquetWriteSupport$$makeWriter$8.apply(ParquetWriteSupport.scala:162)
Oct 01 22:32:47 : at org.apache.spark.sql.execution.datasources.parquet.ParquetWriteSupport$$anonfun$org$apache$spark$sql$execution$datasources$parquet$ParquetWriteSupport$$writeFields$1.apply$mcV$sp(ParquetWriteSupport.scala:124)
Oct 01 22:32:47 : at org.apache.spark.sql.execution.datasources.parquet.ParquetWriteSupport.org$apache$spark$sql$execution$datasources$parquet$ParquetWriteSupport$$consumeField(ParquetWriteSupport.scala:437)
Oct 01 22:32:47 : at org.apache.spark.sql.execution.datasources.parquet.ParquetWriteSupport.org$apache$spark$sql$execution$datasources$parquet$ParquetWriteSupport$$writeFields(ParquetWriteSupport.scala:123)
Oct 01 22:32:47 : at org.apache.spark.sql.execution.datasources.parquet.ParquetWriteSupport$$anonfun$write$1.apply$mcV$sp(ParquetWriteSupport.scala:114)
Oct 01 22:32:47 : at org.apache.spark.sql.execution.datasources.parquet.ParquetWriteSupport.consumeMessage(ParquetWriteSupport.scala:425)
Oct 01 22:32:47 : at org.apache.spark.sql.execution.datasources.parquet.ParquetWriteSupport.write(ParquetWriteSupport.scala:113)
Oct 01 22:32:47 : at org.apache.spark.sql.execution.datasources.parquet.ParquetWriteSupport.write(ParquetWriteSupport.scala:51)
Oct 01 22:32:47 : at org.apache.parquet.hadoop.InternalParquetRecordWriter.write(InternalParquetRecordWriter.java:123)
Oct 01 22:32:47 : at org.apache.parquet.hadoop.ParquetRecordWriter.write(ParquetRecordWriter.java:180)
Oct 01 22:32:47 : at org.apache.parquet.hadoop.ParquetRecordWriter.write(ParquetRecordWriter.java:46)
Oct 01 22:32:47 : at org.apache.spark.sql.execution.datasources.parquet.ParquetOutputWriter.write(ParquetOutputWriter.scala:40)
Oct 01 22:32:47 : at org.apache.spark.sql.execution.datasources.FileFormatWriter$SingleDirectoryWriteTask.execute(FileFormatWriter.scala:327)
Oct 01 22:32:47 : at org.apache.spark.sql.execution.datasources.FileFormatWriter$$anonfun$org$apache$spark$sql$execution$datasources$FileFormatWriter$$executeTask$3.apply(FileFormatWriter.scala:258)
Oct 01 22:32:47 : at org.apache.spark.sql.execution.datasources.FileFormatWriter$$anonfun$org$apache$spark$sql$execution$datasources$FileFormatWriter$$executeTask$3.apply(FileFormatWriter.scala:256)
Oct 01 22:32:47 : at org.apache.spark.util.Utils$.tryWithSafeFinallyAndFailureCallbacks(Utils.scala:1375)
Oct 01 22:32:47 : at org.apache.spark.sql.execution.datasources.FileFormatWriter$.org$apache$spark$sql$execution$datasources$FileFormatWriter$$executeTask(FileFormatWriter.scala:261)
Oct 01 22:32:47 : ... 8 more
Oct 01 22:32:47 : Suppressed: java.lang.NoClassDefFoundError: Could not initialize class org.xerial.snappy.Snappy
Oct 01 22:32:47 : at org.apache.parquet.hadoop.codec.SnappyCompressor.compress(SnappyCompressor.java:67)
Oct 01 22:32:47 : at org.apache.hadoop.io.compress.CompressorStream.compress(CompressorStream.java:81)
Oct 01 22:32:47 : at org.apache.hadoop.io.compress.CompressorStream.finish(CompressorStream.java:92)
Oct 01 22:32:47 : at org.apache.parquet.hadoop.CodecFactory$BytesCompressor.compress(CodecFactory.java:112)
Oct 01 22:32:47 : at org.apache.parquet.hadoop.ColumnChunkPageWriteStore$ColumnChunkPageWriter.writePage(ColumnChunkPageWriteStore.java:93)
Oct 01 22:32:47 : at org.apache.parquet.column.impl.ColumnWriterV1.writePage(ColumnWriterV1.java:150)
Oct 01 22:32:47 : at org.apache.parquet.column.impl.ColumnWriterV1.flush(ColumnWriterV1.java:238)
Oct 01 22:32:47 : at org.apache.parquet.column.impl.ColumnWriteStoreV1.flush(ColumnWriteStoreV1.java:121)
Oct 01 22:32:47 : at org.apache.parquet.hadoop.InternalParquetRecordWriter.flushRowGroupToStore(InternalParquetRecordWriter.java:167)
Oct 01 22:32:47 : at org.apache.parquet.hadoop.InternalParquetRecordWriter.close(InternalParquetRecordWriter.java:109)
Oct 01 22:32:47 : at org.apache.parquet.hadoop.ParquetRecordWriter.close(ParquetRecordWriter.java:163)
Oct 01 22:32:47 : at org.apache.spark.sql.execution.datasources.parquet.ParquetOutputWriter.close(ParquetOutputWriter.scala:42)
Oct 01 22:32:47 : at org.apache.spark.sql.execution.datasources.FileFormatWriter$SingleDirectoryWriteTask.releaseResources(FileFormatWriter.scala:337)
Oct 01 22:32:47 : at org.apache.spark.sql.execution.datasources.FileFormatWriter$$anonfun$org$apache$spark$sql$execution$datasources$FileFormatWriter$$executeTask$1.apply$mcV$sp(FileFormatWriter.scala:264)
Oct 01 22:32:47 : at org.apache.spark.util.Utils$.tryWithSafeFinallyAndFailureCallbacks(Utils.scala:1384)
Oct 01 22:32:47 : ... 9 more
Oct 01 22:32:47 : 2017-10-02 05:32:47 WARN FileOutputCommitter:569 - Could not delete s3a://bucketname/kissmetrics_warehouse/dev/gary/out/Derived_NonAppPageViewed.parquet/_temporary/0/_temporary/attempt_20171002053247_0003_m_000025_1
Oct 01 22:32:47 : 2017-10-02 05:32:47 ERROR FileFormatWriter:70 - Job job_20171002053247_0003 aborted.
Oct 01 22:32:47 : 2017-10-02 05:32:47 WARN Utils:87 - Suppressing exception in catch: Could not initialize class org.xerial.snappy.Snappy
Oct 01 22:32:47 : java.lang.NoClassDefFoundError: Could not initialize class org.xerial.snappy.Snappy
Oct 01 22:32:47 : at org.apache.parquet.hadoop.codec.SnappyCompressor.compress(SnappyCompressor.java:67)
Oct 01 22:32:47 : at org.apache.hadoop.io.compress.CompressorStream.compress(CompressorStream.java:81)
Oct 01 22:32:47 : at org.apache.hadoop.io.compress.CompressorStream.finish(CompressorStream.java:92)
Oct 01 22:32:47 : at org.apache.parquet.hadoop.CodecFactory$BytesCompressor.compress(CodecFactory.java:112)
Oct 01 22:32:47 : at org.apache.parquet.hadoop.ColumnChunkPageWriteStore$ColumnChunkPageWriter.writePage(ColumnChunkPageWriteStore.java:93)
Oct 01 22:32:47 : at org.apache.parquet.column.impl.ColumnWriterV1.writePage(ColumnWriterV1.java:150)
Oct 01 22:32:47 : at org.apache.parquet.column.impl.ColumnWriterV1.flush(ColumnWriterV1.java:238)
Oct 01 22:32:47 : at org.apache.parquet.column.impl.ColumnWriteStoreV1.flush(ColumnWriteStoreV1.java:121)
Oct 01 22:32:47 : at org.apache.parquet.hadoop.InternalParquetRecordWriter.flushRowGroupToStore(InternalParquetRecordWriter.java:167)
Oct 01 22:32:47 : at org.apache.parquet.hadoop.InternalParquetRecordWriter.close(InternalParquetRecordWriter.java:109)
Oct 01 22:32:47 : at org.apache.parquet.hadoop.ParquetRecordWriter.close(ParquetRecordWriter.java:163)
Oct 01 22:32:47 : at org.apache.spark.sql.execution.datasources.parquet.ParquetOutputWriter.close(ParquetOutputWriter.scala:42)
Oct 01 22:32:47 : at org.apache.spark.sql.execution.datasources.FileFormatWriter$SingleDirectoryWriteTask.releaseResources(FileFormatWriter.scala:337)
Oct 01 22:32:47 : at org.apache.spark.sql.execution.datasources.FileFormatWriter$$anonfun$org$apache$spark$sql$execution$datasources$FileFormatWriter$$executeTask$1.apply$mcV$sp(FileFormatWriter.scala:264)
Oct 01 22:32:47 : at org.apache.spark.util.Utils$.tryWithSafeFinallyAndFailureCallbacks(Utils.scala:1384)
Oct 01 22:32:47 : at org.apache.spark.sql.execution.datasources.FileFormatWriter$.org$apache$spark$sql$execution$datasources$FileFormatWriter$$executeTask(FileFormatWriter.scala:261)
Oct 01 22:32:47 : at org.apache.spark.sql.execution.datasources.FileFormatWriter$$anonfun$write$1$$anonfun$apply$mcV$sp$1.apply(FileFormatWriter.scala:191)
Oct 01 22:32:47 : at org.apache.spark.sql.execution.datasources.FileFormatWriter$$anonfun$write$1$$anonfun$apply$mcV$sp$1.apply(FileFormatWriter.scala:190)
Oct 01 22:32:47 : at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:87)
Oct 01 22:32:47 : at org.apache.spark.scheduler.Task.run(Task.scala:108)
Oct 01 22:32:47 : at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:335)
Oct 01 22:32:47 : at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
Oct 01 22:32:47 : at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
Oct 01 22:32:47 : at java.lang.Thread.run(Thread.java:748)
Does anyone know if Snappy is used for all parquet writes to S3? The test writes were quite a bit smaller.
One should be able to inspect the written files to know if they're compressed with Snappy or not. My recollection is that they are always compressed.
Just tested a write that works, got rid of the sample(true, 0.1)
Looked in executor logs:
Oct 02 23:01:43 data-cluster-slow-test k8s.pytest-parquet-1507010416342-exec-1_default_executor: 2017-10-03 06:01:05 INFO CodecConfig:95 - Compression: SNAPPY
So this is just weird.
@luck02 I ran into this as well and found this in the stack trace:
java.lang.UnsatisfiedLinkError: /tmp/snappy-1.1.2-6e299ad6-46dd-4ff3-aaac-112b4383ccf8-libsnappyjava.so: Error loading shared library ld-linux-x86-64.so.2: Nosuch file or directory (needed by /tmp/snappy-1.1.2-6e299ad6-46dd-4ff3-aaac-112b4383ccf8-libsnappyjava.so)
at java.lang.ClassLoader$NativeLibrary.load(Native Method)
at java.lang.ClassLoader.loadLibrary0(ClassLoader.java:1941)
at java.lang.ClassLoader.loadLibrary(ClassLoader.java:1824)
at java.lang.Runtime.load0(Runtime.java:809)
at java.lang.System.load(System.java:1086)
at org.xerial.snappy.SnappyLoader.loadNativeLibrary(SnappyLoader.java:174)
at org.xerial.snappy.SnappyLoader.load(SnappyLoader.java:152)
at org.xerial.snappy.Snappy.<clinit>(Snappy.java:47)
... 48 elided
It seems like this is happening because the Spark Docker images are based on Alpine, which uses musl libc? I tried adding the libc6-compat
apk package in custom images for the driver/executor and Snappy seems to work fine now
Thank you for this investigation @akakitani !!!!!
That's awesome @akakitani where did you find that? In the executor logs? I'll reproduce. The logs are super verbose so it's hard to find useful information.
This was on the driver logs actually, and it was when trying to read some Parquet files. This has worked so far for me on writes too 👍
@mccheah thoughts on modifying the docker images to include libc6-compat
, if this in reality a fix? Is this one of the extra packages you guys are using in your customized Docker images?
We install https://github.com/sgerrand/alpine-pkg-glibc/releases/download/2.25-r0/glibc-2.25-r0.apk which might be similar?
Found it in our trace as well:
Oct 04 21:38:00 data-cluster-slow-test k8s.pytest-parquet-1507176181001-exec-2_default_executor: java.lang.UnsatisfiedLinkError: /tmp/snappy-1.1.2-2506cd9c-0dc3-49eb-958a-a032e8b7b784-libsnappyjava.so: Error loading shared library ld-linux-x86-64.so.2: No such file or directory (needed by /tmp/snappy-1.1.2-2506cd9c-0dc3-49eb-958a-a032e8b7b784-libsnappyjava.so)
Oct 04 21:38:00 data-cluster-slow-test k8s.pytest-parquet-1507176181001-exec-2_default_executor: at java.lang.ClassLoader$NativeLibrary.load(Native Method)
Ugh, it only appears once. in about 20k lines of logs. Wow, needle in a haystack. Don't have time to complete experiment this evening by adding to the images.
Great Catch @akakitani ! 🎉
A fix is proposed in #550
I built off the 2.2 branch. Works great until it tries to write to parquet files.
Then it can't seem to initialize / find org.xerial.snappy.Snappy. I see snappy in the jars directory of the spark distro. My endpoint is
s3a:\bucket\etc
and I've included the following jars.I'm not sure if this is something odd in spark distro or if I'm doing something odd.
Thanks!
I can provide the full error log output if that'll help.