exacaster / lighter

REST API for Apache Spark on K8S or YARN
MIT License
91 stars 21 forks source link

Marking application failed because of error: javax.net.ssl.SSLPeerUnverifiedException: Hostname kubernetes.default.svc.cluster.local not verified: #311

Closed NiklasRosenstein closed 1 year ago

NiklasRosenstein commented 1 year ago

When I try to start an Application via SparkMagic through Lighter, the application seems to fail to start and the Lighter logs show the followin error:

23:58:33.532 [launcher-proc-2] WARN  c.e.l.backend.ClusterSparkListener - State changed with error: Caused by: javax.net.ssl.SSLPeerUnverifiedException: Hostname kubernetes.default.svc.cluster.local not verified:
23:58:33.532 [launcher-proc-2] WARN  c.e.l.a.ApplicationStatusHandler - Marking application b3ca2493-5b90-4c49-b23b-44db39313644 failed because of error Caused by: javax.net.ssl.SSLPeerUnverifiedException: Hostname kubernetes.default.svc.cluster.local not verified:

I'm unsure where exactly that certificate error occurs; is it in the Spark driver pod that Lighter presumably starts in the Kubernetes cluster that I have it and Spark running in? (Although I have not seen a Pod spawning between clicking "Create Session" in SparkMagic and the error showing up in the Lighter logs) Or is it ni the Lighter configuration that I need to make aware of my cluster's CA?

pdambrauskas commented 1 year ago

It looks like a configuration issuse. Lighter tries to create Spark Driver Pod, but fails due to PeerUnverifiedException: Hostname kubernetes.default.svc.cluster.local not verified. kubernetes.default.svc.cluster.local is a DNS name for accessing kubernetes service API, it should be available on all kubernetes installations.

Maybe something is wrong with Service Account or Role binding (I'd imagine in that case errors would be different, but maybe..) Have you followed documentation in https://github.com/exacaster/lighter/blob/master/docs/kubernetes.md?

Did you change something in that documentation to fit your needs? If you did everything according to the documentation, these kubernetes resources should be present:

➜  ~ kubectl get pods -n spark
NAME                                                                                     READY   STATUS      RESTARTS      AGE
lighter-5965d6ffb8-qf457                                                                 1/1     Running     0             2d21h
➜  ~ kubectl get sa -n spark
NAME                   SECRETS   AGE
default                1         619d
spark                  1         583d
➜  ~ kubectl get rolebinding -n spark
NAME            ROLE                 AGE
lighter-spark   Role/lighter-spark   583d
➜  ~

You can also try changing LIGHTER_KUBERNETES_MASTER to k8s://kubernetes.default.svc:443, or corresponding IP address.

NiklasRosenstein commented 1 year ago

Hi @pdambrauskas,

Gotcha! I'm just surprised because if Lighter just inherits the Kubernetes service account under /var/run/secrets/kubernetes.io the CA authority is present there.

root@lighter-79bfc75b4b-q25rs:/var/run/secrets/kubernetes.io/serviceaccount# ls
ca.crt  namespace  token

I did follow the Kubernetes guide you reference, and I seem to have all the required resources:

coder@coder-niklasrosenstein-workspace:~/git/cluster-configuration  > kubectl get pods -n spark
NAME                       READY   STATUS    RESTARTS   AGE
lighter-79bfc75b4b-q25rs   1/1     Running   0          14h
lighter-db-postgresql-0    1/1     Running   0          37h
spark-master-0             1/1     Running   0          37h
spark-worker-0             1/1     Running   0          37h
spark-worker-1             1/1     Running   0          34h
coder@coder-niklasrosenstein-workspace:~/git/cluster-configuration  > kubectl get sa -n spark
NAME      SECRETS   AGE
default   1         14d
spark     1         14d
coder@coder-niklasrosenstein-workspace:~/git/cluster-configuration  > kubectl get rolebinding -n spark
NAME            ROLE                 AGE
lighter-spark   Role/lighter-spark   14d

You don't seem to have Spark installed separately, does spark-master just live in a different namespace in your case?

NiklasRosenstein commented 1 year ago

I've tried setting LIGHTER_KUBERNETES_MASTER to k8s://IP:6443 where IP is the IP of the first network interface where kubeapi service is running (I currently run a k0s distribution on a single node).

That seems to have change the results a bit.

14:09:00.125 [scheduled-executor-thread-23] INFO  c.e.l.a.sessions.SessionHandler - Start provisioning permanent sessions.
14:09:00.125 [scheduled-executor-thread-23] INFO  c.e.l.a.sessions.SessionHandler - End provisioning permanent sessions.
14:09:00.127 [scheduled-executor-thread-11] INFO  c.e.l.a.sessions.SessionHandler - Launching Application[id='f4788106-d005-49e9-be74-30b9c1216baf', type=SESSION, state=NOT_STARTED, appId='null', appInfo='null', submitParams=SubmitParams[name='session_d3d8a4aa-2de3-47e2-bf20-5a4fc0f3a648', file='http://lighter.spark:8080/lighter/jobs/shell_wrapper.py', master='null', mainClass='null', numExecutors=1, executorCores=1, executorMemory='1000M', driverCores=1, driverMemory='1000M', args=[], pyFiles=[], files=[], jars=[], archives=[], conf={}], createdAt=2023-01-14T14:08:57.274658, contactedAt=null]
Jan 14, 2023 2:09:00 PM org.apache.spark.launcher.OutputRedirector redirect
INFO: /home/app/spark//bin/load-spark-env.sh: line 68: ps: command not found
14:09:00.559 [scheduled-executor-thread-12] INFO  c.e.l.application.batch.BatchHandler - Completed 0 jobs
14:09:00.562 [scheduled-executor-thread-8] INFO  c.e.l.application.batch.BatchHandler - Processing scheduled batches, found empty slots: 15, using 10
14:09:00.563 [scheduled-executor-thread-8] INFO  c.e.l.application.batch.BatchHandler - Waiting launches to complete
Jan 14, 2023 2:09:01 PM org.apache.spark.launcher.OutputRedirector redirect
INFO: 23/01/14 14:09:01 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Jan 14, 2023 2:09:01 PM org.apache.spark.launcher.OutputRedirector redirect
INFO: 23/01/14 14:09:01 INFO SparkKubernetesClientFactory: Auto-configuring K8S client using current context from users K8S config file
Jan 14, 2023 2:09:02 PM org.apache.spark.launcher.OutputRedirector redirect
INFO: 23/01/14 14:09:02 WARN DriverServiceFeatureStep: Driver's hostname would preferably be session-d3d8a4aa-2de3-47e2-bf20-5a4fc0f3a648-5809d285b09cc25d-driver-svc, but this is too long (must be <= 63 characters). Falling back to use spark-6613ee85b09cc4a8-driver-svc as the driver service's name.
Jan 14, 2023 2:09:02 PM org.apache.spark.launcher.OutputRedirector redirect
INFO: 23/01/14 14:09:02 INFO KerberosConfDriverFeatureStep: You have not specified a krb5.conf file locally or via a ConfigMap. Make sure that you have the krb5.conf locally on the driver image.
Jan 14, 2023 2:09:02 PM org.apache.spark.launcher.OutputRedirector redirect
INFO: 23/01/14 14:09:02 INFO ShutdownHookManager: Shutdown hook called
Jan 14, 2023 2:09:02 PM org.apache.spark.launcher.OutputRedirector redirect
INFO: 23/01/14 14:09:02 INFO ShutdownHookManager: Deleting directory /tmp/spark-354cb523-92fb-4a13-ac8e-5fc65ebb8273
Jan 14, 2023 2:09:02 PM org.apache.spark.launcher.OutputRedirector redirect
INFO: 23/01/14 14:09:02 INFO ShutdownHookManager: Deleting directory /tmp/spark-f669604b-f230-4fe6-b4ed-270ab5ef5d2b
Jan 14, 2023 2:09:02 PM org.apache.spark.launcher.OutputRedirector redirect
INFO: 23/01/14 14:09:02 INFO ShutdownHookManager: Deleting directory /tmp/spark-9425cac7-d3a7-409d-b090-ec1e46df7ca1
14:09:02.596 [launcher-proc-1] INFO  c.e.l.backend.ClusterSparkListener - State change. AppId: null, State: LOST
WARNING: An illegal reflective access operation has occurred
WARNING: Illegal reflective access by py4j.reflection.ReflectionShim (file:/home/app/libs/py4j-0.10.9.7.jar) to method java.util.ArrayList$Itr.next()
WARNING: Please consider reporting this to the maintainers of py4j.reflection.ReflectionShim
WARNING: Use --illegal-access=warn to enable warnings of further illegal reflective access operations
WARNING: All illegal access operations will be denied in a future release
14:09:30.569 [scheduled-executor-thread-12] INFO  c.e.l.application.batch.BatchHandler - Completed 0 jobs
14:09:30.571 [scheduled-executor-thread-16] INFO  c.e.l.application.batch.BatchHandler - Processing scheduled batches, found empty slots: 15, using 10
14:09:30.571 [scheduled-executor-thread-16] INFO  c.e.l.application.batch.BatchHandler - Waiting launches to complete
14:10:00.125 [scheduled-executor-thread-1] INFO  c.e.l.a.sessions.SessionHandler - Start provisioning permanent sessions.
14:10:00.125 [scheduled-executor-thread-1] INFO  c.e.l.a.sessions.SessionHandler - End provisioning permanent sessions.
14:10:00.457 [scheduled-executor-thread-9] INFO  c.e.l.a.ApplicationStatusHandler - Tracking Application[id='f4788106-d005-49e9-be74-30b9c1216baf', type=SESSION, state=STARTING, appId='null', appInfo='null', submitParams=SubmitParams[name='session_d3d8a4aa-2de3-47e2-bf20-5a4fc0f3a648', file='http://lighter.spark:8080/lighter/jobs/shell_wrapper.py', master='null', mainClass='null', numExecutors=1, executorCores=1, executorMemory='1000M', driverCores=1, driverMemory='1000M', args=[], pyFiles=[], files=[], jars=[], archives=[], conf={}], createdAt=2023-01-14T14:08:57.274658, contactedAt=2023-01-14T14:09:00.131426], info: ApplicationInfo[state=IDLE, applicationId='spark-bf370e5a0e784d49a86d927ff4873144']
14:10:00.578 [scheduled-executor-thread-6] INFO  c.e.l.application.batch.BatchHandler - Completed 0 jobs
14:10:00.579 [scheduled-executor-thread-10] INFO  c.e.l.application.batch.BatchHandler - Processing scheduled batches, found empty slots: 15, using 10
14:10:00.579 [scheduled-executor-thread-10] INFO  c.e.l.application.batch.BatchHandler - Waiting launches to complete
14:10:00.866 [Thread-3] INFO  c.e.l.a.s.p.p.PythonSessionIntegration - Waiting: [Statement[id='9d86b013-dfac-411e-95a7-1b8e4fe75dc8', code='spark', output=null, state='waiting', createdAt='2023-01-14T14:10:00.860492']]
14:10:00.877 [Thread-3] WARN  c.e.l.a.s.p.p.PythonSessionIntegration - Handling response for f4788106-d005-49e9-be74-30b9c1216baf : 9d86b013-dfac-411e-95a7-1b8e4fe75dc8 --- {content={text/plain=<pyspark.sql.session.SparkSession object at 0x7f6e8474e5b0>}}
14:10:01.182 [Thread-3] INFO  c.e.l.a.s.p.p.PythonSessionIntegration - Waiting: [Statement[id='23fb2f98-5221-40ba-b151-a24b24da46e1', code='', output=null, state='waiting', createdAt='2023-01-14T14:10:01.171992']]
14:10:01.192 [Thread-3] WARN  c.e.l.a.s.p.p.PythonSessionIntegration - Handling response for f4788106-d005-49e9-be74-30b9c1216baf : 23fb2f98-5221-40ba-b151-a24b24da46e1 --- {error=IndexError, message=pop from empty list, traceback=[Traceback (most recent call last):
,   File "/tmp/spark-daeb26e9-9c22-4361-8210-b9992b4fdf49/shell_wrapper.py", line 112, in exec
    self._exec_then_eval(code.rstrip())
,   File "/tmp/spark-daeb26e9-9c22-4361-8210-b9992b4fdf49/shell_wrapper.py", line 103, in _exec_then_eval
    last = ast.Interactive([block.body.pop()])
, IndexError: pop from empty list
]}
14:10:30.584 [scheduled-executor-thread-6] INFO  c.e.l.application.batch.BatchHandler - Completed 0 jobs
14:10:30.585 [scheduled-executor-thread-20] INFO  c.e.l.application.batch.BatchHandler - Processing scheduled batches, found empty slots: 15, using 10
14:10:30.585 [scheduled-executor-thread-20] INFO  c.e.l.application.batch.BatchHandler - Waiting launches to complete
pdambrauskas commented 1 year ago

You don't seem to have Spark installed separately, does spark-master just live in a different namespace in your case?

We do not run Spark in standalone mode, so no Spark master is needed. More details here: https://spark.apache.org/docs/latest/running-on-kubernetes.html

That seems to have change the results a bit.

It looks like it launched the session somewhat successfully. It looks like spark session was created:

14:10:00.866 [Thread-3] INFO  c.e.l.a.s.p.p.PythonSessionIntegration - Waiting: [Statement[id='9d86b013-dfac-411e-95a7-1b8e4fe75dc8', code='spark', output=null, state='waiting', createdAt='2023-01-14T14:10:00.860492']]
14:10:00.877 [Thread-3] WARN  c.e.l.a.s.p.p.PythonSessionIntegration - Handling response for f4788106-d005-49e9-be74-30b9c1216baf : 9d86b013-dfac-411e-95a7-1b8e4fe75dc8 --- {content={text/plain=<pyspark.sql.session.SparkSession object at 0x7f6e8474e5b0>}}

But your next statement failed:

14:10:01.182 [Thread-3] INFO  c.e.l.a.s.p.p.PythonSessionIntegration - Waiting: [Statement[id='23fb2f98-5221-40ba-b151-a24b24da46e1', code='', output=null, state='waiting', createdAt='2023-01-14T14:10:01.171992']]
14:10:01.192 [Thread-3] WARN  c.e.l.a.s.p.p.PythonSessionIntegration - Handling response for f4788106-d005-49e9-be74-30b9c1216baf : 23fb2f98-5221-40ba-b151-a24b24da46e1 --- {error=IndexError, message=pop from empty list, traceback=[Traceback (most recent call last):

I do not understand how you manged to send empty code wit this statement (code=''), our jupyter notebook just skips these kind of statements. Did the session failed after eventually? Have you tried to execute more statements? Can you check Lighter UI and see what is the status of your session and what are the logs of session driver pod? It looks like it did not failed, only one of your statements got error response

NiklasRosenstein commented 1 year ago

I think that was caused by running a cell with just %spark in it.

image

Executing this example from SparkMagic

%%spark
numbers = sc.parallelize([1, 2, 3, 4])
print('First element of numbers is {} and its description is:\n{}'.format(numbers.first(), numbers.toDebugString()))

gives me

15:24:37.727 [Thread-3] INFO  c.e.l.a.s.p.p.PythonSessionIntegration - Waiting: [Statement[id='72d32f7f-abd9-497a-b494-538bc974a5de', code='numbers = sc.parallelize([1, 2, 3, 4])
print('First element of numbers is {} and its description is:\n{}'.format(numbers.first(), numbers.toDebugString()))\
', output=null, state='waiting', createdAt='2023-01-14T15:24:37.716379']]
15:24:37.735 [Thread-3] WARN  c.e.l.a.s.p.p.PythonSessionIntegration - Handling response for f4788106-d005-49e9-be74-30b9c1216baf : 72d32f7f-abd9-497a-b494-538bc974a5de --- {error=SyntaxError, message=unexpected EOF while parsing (<unknown>, line 2), traceback=[Traceback (most recent call last):
,   File "/tmp/spark-daeb26e9-9c22-4361-8210-b9992b4fdf49/shell_wrapper.py", line 112, in exec
    self._exec_then_eval(code.rstrip())
,   File "/tmp/spark-daeb26e9-9c22-4361-8210-b9992b4fdf49/shell_wrapper.py", line 100, in _exec_then_eval
    block = ast.parse(code, mode='exec')
,   File "/usr/lib/python3.9/ast.py", line 50, in parse
    return compile(source, filename, mode, flags,
,   File "<unknown>", line 2
,     print('First element of numbers is {} and its description is:\n{}'.format(numbers.first(), numbers.toDebugString()))\
,                                                                                                                          ^
, SyntaxError: unexpected EOF while parsing
]}

(and basically the same error shown in the notebook)

The Lighter UI shows no Batches and one Session:

image

NiklasRosenstein commented 1 year ago
Driver pod logs ``` kubectl logs -n spark session-d3d8a4aa-2de3-47e2-bf20-5a4fc0f3a648-5809d285b09cc25d-driver ++ id -u + myuid=185 ++ id -g + mygid=0 + set +e ++ getent passwd 185 + uidentry= + set -e + '[' -z '' ']' + '[' -w /etc/passwd ']' + echo '185:x:185:0:anonymous uid:/opt/spark:/bin/false' + '[' -z /usr/local/openjdk-11 ']' + SPARK_CLASSPATH=':/opt/spark/jars/*' + env + grep SPARK_JAVA_OPT_ + sort -t_ -k4 -n + sed 's/[^=]*=\(.*\)/\1/g' + readarray -t SPARK_EXECUTOR_JAVA_OPTS + '[' -n '' ']' + '[' -z ']' + '[' -z ']' + '[' -n '' ']' + '[' -z ']' + '[' -z x ']' + SPARK_CLASSPATH='/opt/spark/conf::/opt/spark/jars/*' + case "$1" in + shift 1 + CMD=("$SPARK_HOME/bin/spark-submit" --conf "spark.driver.bindAddress=$SPARK_DRIVER_BIND_ADDRESS" --deploy-mode client "$@") + exec /usr/bin/tini -s -- /opt/spark/bin/spark-submit --conf spark.driver.bindAddress=10.61.5.84 --deploy-mode client --properties-file /opt/spark/conf/spark.properties --class org.apache.spark.deploy.PythonRunner http://lighter.spark:8080/lighter/jobs/shell_wrapper.py 23/01/14 14:09:19 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable INFO:session:Initiating session f4788106-d005-49e9-be74-30b9c1216baf 23/01/14 14:09:19 INFO SparkContext: Running Spark version 3.3.1 23/01/14 14:09:19 INFO ResourceUtils: ============================================================== 23/01/14 14:09:19 INFO ResourceUtils: No custom resources configured for spark.driver. 23/01/14 14:09:19 INFO ResourceUtils: ============================================================== 23/01/14 14:09:19 INFO SparkContext: Submitted application: f4788106-d005-49e9-be74-30b9c1216baf 23/01/14 14:09:19 INFO ResourceProfile: Default ResourceProfile created, executor resources: Map(cores -> name: cores, amount: 1, script: , vendor: , memory -> name: memory, amount: 1000, script: , vendor: , offHeap -> name: offHeap, amount: 0, script: , vendor: ), task resources: Map(cpus -> name: cpus, amount: 1.0) 23/01/14 14:09:19 INFO ResourceProfile: Limiting resource is cpus at 1 tasks per executor 23/01/14 14:09:19 INFO ResourceProfileManager: Added ResourceProfile id: 0 23/01/14 14:09:19 INFO SecurityManager: Changing view acls to: 185,root 23/01/14 14:09:19 INFO SecurityManager: Changing modify acls to: 185,root 23/01/14 14:09:19 INFO SecurityManager: Changing view acls groups to: 23/01/14 14:09:19 INFO SecurityManager: Changing modify acls groups to: 23/01/14 14:09:19 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(185, root); groups with view permissions: Set(); users with modify permissions: Set(185, root); groups with modify permissions: Set() 23/01/14 14:09:19 INFO Utils: Successfully started service 'sparkDriver' on port 7078. 23/01/14 14:09:19 INFO SparkEnv: Registering MapOutputTracker 23/01/14 14:09:19 INFO SparkEnv: Registering BlockManagerMaster 23/01/14 14:09:19 INFO BlockManagerMasterEndpoint: Using org.apache.spark.storage.DefaultTopologyMapper for getting topology information 23/01/14 14:09:19 INFO BlockManagerMasterEndpoint: BlockManagerMasterEndpoint up 23/01/14 14:09:19 INFO SparkEnv: Registering BlockManagerMasterHeartbeat 23/01/14 14:09:19 INFO DiskBlockManager: Created local directory at /var/data/spark-497b7a02-ade1-40d0-9772-2a6a3d51d1e8/blockmgr-70535959-0fac-45e7-884f-834ec3189417 23/01/14 14:09:19 INFO MemoryStore: MemoryStore started with capacity 400.0 MiB 23/01/14 14:09:19 INFO SparkEnv: Registering OutputCommitCoordinator 23/01/14 14:09:19 INFO Utils: Successfully started service 'SparkUI' on port 4040. 23/01/14 14:09:19 INFO SparkKubernetesClientFactory: Auto-configuring K8S client using current context from users K8S config file 23/01/14 14:09:20 INFO ExecutorPodsAllocator: Going to request 1 executors from Kubernetes for ResourceProfile Id: 0, target: 1, known: 0, sharedSlotFromPendingPods: 2147483647. 23/01/14 14:09:20 INFO BasicExecutorFeatureStep: Decommissioning not enabled, skipping shutdown script 23/01/14 14:09:20 INFO Utils: Successfully started service 'org.apache.spark.network.netty.NettyBlockTransferService' on port 7079. 23/01/14 14:09:20 INFO NettyBlockTransferService: Server created on spark-6613ee85b09cc4a8-driver-svc.spark.svc:7079 23/01/14 14:09:20 INFO BlockManager: Using org.apache.spark.storage.RandomBlockReplicationPolicy for block replication policy 23/01/14 14:09:20 INFO BlockManagerMaster: Registering BlockManager BlockManagerId(driver, spark-6613ee85b09cc4a8-driver-svc.spark.svc, 7079, None) 23/01/14 14:09:20 INFO BlockManagerMasterEndpoint: Registering block manager spark-6613ee85b09cc4a8-driver-svc.spark.svc:7079 with 400.0 MiB RAM, BlockManagerId(driver, spark-6613ee85b09cc4a8-driver-svc.spark.svc, 7079, None) 23/01/14 14:09:20 INFO BlockManagerMaster: Registered BlockManager BlockManagerId(driver, spark-6613ee85b09cc4a8-driver-svc.spark.svc, 7079, None) 23/01/14 14:09:20 INFO BlockManager: Initialized BlockManager: BlockManagerId(driver, spark-6613ee85b09cc4a8-driver-svc.spark.svc, 7079, None) 23/01/14 14:09:23 INFO KubernetesClusterSchedulerBackend$KubernetesDriverEndpoint: Registered executor NettyRpcEndpointRef(spark-client://Executor) (10.61.5.85:57484) with ID 1, ResourceProfileId 0 23/01/14 14:09:23 INFO BlockManagerMasterEndpoint: Registering block manager 10.61.5.85:35849 with 400.0 MiB RAM, BlockManagerId(1, 10.61.5.85, 35849, None) 23/01/14 14:09:23 INFO KubernetesClusterSchedulerBackend: SchedulerBackend is ready for scheduling beginning after reached minRegisteredResourcesRatio: 0.8 INFO:session:Starting session loop INFO:session:Processing command {'id': '9d86b013-dfac-411e-95a7-1b8e4fe75dc8', 'code': 'spark'} INFO:session:Response sent INFO:session:Processing command {'id': '23fb2f98-5221-40ba-b151-a24b24da46e1', 'code': ''} ERROR:session:pop from empty list Traceback (most recent call last): File "/tmp/spark-daeb26e9-9c22-4361-8210-b9992b4fdf49/shell_wrapper.py", line 112, in exec self._exec_then_eval(code.rstrip()) File "/tmp/spark-daeb26e9-9c22-4361-8210-b9992b4fdf49/shell_wrapper.py", line 103, in _exec_then_eval last = ast.Interactive([block.body.pop()]) IndexError: pop from empty list INFO:session:Response sent INFO:session:Processing command {'id': '33e73b7b-43ce-4d74-9b8d-1c39273d1fe4', 'code': ''} ERROR:session:pop from empty list Traceback (most recent call last): File "/tmp/spark-daeb26e9-9c22-4361-8210-b9992b4fdf49/shell_wrapper.py", line 112, in exec self._exec_then_eval(code.rstrip()) File "/tmp/spark-daeb26e9-9c22-4361-8210-b9992b4fdf49/shell_wrapper.py", line 103, in _exec_then_eval last = ast.Interactive([block.body.pop()]) IndexError: pop from empty list INFO:session:Response sent INFO:session:Processing command {'id': '8c9bb20b-dbaa-4197-8b6d-2f4197465078', 'code': "numbers = sc.parallelize([1, 2, 3, 4])\nprint('First element of numbers is {} and its description is:\\n{}'.format(numbers.first(), numbers.toDebugString()))\n\n"} ERROR:session:name 'sc' is not defined Traceback (most recent call last): File "/tmp/spark-daeb26e9-9c22-4361-8210-b9992b4fdf49/shell_wrapper.py", line 112, in exec self._exec_then_eval(code.rstrip()) File "/tmp/spark-daeb26e9-9c22-4361-8210-b9992b4fdf49/shell_wrapper.py", line 105, in _exec_then_eval exec(compile(block, '', 'exec'), self.globals) File "", line 1, in NameError: name 'sc' is not defined INFO:session:Response sent INFO:session:Processing command {'id': 'a39b742a-1912-4829-98ae-91c018a17de6', 'code': "numbers = sc.parallelize([1, 2, 3, 4])\nprint('First element of numbers is {} and its description is:\\n{}'.format(numbers.first(), numbers.toDebugString()))\\\n"} ERROR:session:unexpected EOF while parsing (, line 2) Traceback (most recent call last): File "/tmp/spark-daeb26e9-9c22-4361-8210-b9992b4fdf49/shell_wrapper.py", line 112, in exec self._exec_then_eval(code.rstrip()) File "/tmp/spark-daeb26e9-9c22-4361-8210-b9992b4fdf49/shell_wrapper.py", line 100, in _exec_then_eval block = ast.parse(code, mode='exec') File "/usr/lib/python3.9/ast.py", line 50, in parse return compile(source, filename, mode, flags, File "", line 2 print('First element of numbers is {} and its description is:\n{}'.format(numbers.first(), numbers.toDebugString()))\ ^ SyntaxError: unexpected EOF while parsing INFO:session:Response sent INFO:session:Processing command {'id': '72d32f7f-abd9-497a-b494-538bc974a5de', 'code': "numbers = sc.parallelize([1, 2, 3, 4])\nprint('First element of numbers is {} and its description is:\\n{}'.format(numbers.first(), numbers.toDebugString()))\\\n"} ERROR:session:unexpected EOF while parsing (, line 2) Traceback (most recent call last): File "/tmp/spark-daeb26e9-9c22-4361-8210-b9992b4fdf49/shell_wrapper.py", line 112, in exec self._exec_then_eval(code.rstrip()) File "/tmp/spark-daeb26e9-9c22-4361-8210-b9992b4fdf49/shell_wrapper.py", line 100, in _exec_then_eval block = ast.parse(code, mode='exec') File "/usr/lib/python3.9/ast.py", line 50, in parse return compile(source, filename, mode, flags, File "", line 2 print('First element of numbers is {} and its description is:\n{}'.format(numbers.first(), numbers.toDebugString()))\ ^ SyntaxError: unexpected EOF while parsing INFO:session:Response sent ```
pdambrauskas commented 1 year ago

Just use spark without %%.

Also I see, that you've selected scala as your Session language. Not sure if you've noticed, but Lighter Sessions only support python, so it makes no difference, what language you've chosen in the UI, Lighter will always start PySpark session.

I've tried your example on my notebook:

image
pdambrauskas commented 1 year ago

I've also managed to run spark cell in Python Session (Like you tried):

image