Closed webmakaka closed 2 years ago
it seems it was a local problem with my computer resources.
I set grafana and prometheus-operator scale to 0. Run Elyra Notebook Image with Spark with a small container size.
And steps
RUN -> Chapter09/explore_data.ipynb
RUN -> Chapter09/merge_data.ipynb
RUN -> Chapter09/clean_data.ipynb
are OK for now.
=========================
My working config is:
$ export \
PROFILE=marley-minikube \
CPUS=8 \
MEMORY=30G \
HDD=80G \
DRIVER=docker \
KUBERNETES_VERSION=v1.24.4
And there is not enough resources
Could be a solution.
import os
from pyspark.sql import SparkSession
spark = SparkSession \
.builder \
.appName("Python Spark S3 example") \
.config("spark.hadoop.fs.s3a.endpoint", "http://minio-ml-workshop:9000")\
.config("spark.hadoop.fs.s3a.access.key", 'minio')\
.config("spark.hadoop.fs.s3a.secret.key", 'minio123')\
.config("spark.hadoop.fs.s3a.impl", "org.apache.hadoop.fs.s3a.S3AFileSystem")\
.config("spark.hadoop.fs.s3a.multipart.size", "104857600")\
.config("spark.hadoop.fs.s3a.path.style.access", "true")\
.getOrCreate()
dfAirlines = spark.read\
.options(delimeter=',', inferSchema='True', header='True') \
.csv("s3a://airport-data/airlines.csv")
dfAirlines.printSchema()
dfAirports = spark.read \
.options(delimiter=',', inferSchema='True', header='True') \
.csv("s3a://airport-data/airports.csv")
dfAirports.printSchema()
dfAirports.show(truncate=False)
dfAirlines.show(truncate=False)
print(dfAirports.count())
print(dfAirlines.count())
spark.stop()
Hello!
When i run block
I am getting next error:
How to fix it? Thanks!
Previous steps in this script finished correctly