GoogleCloudDataproc / dataproc-jupyter-plugin

Apache License 2.0
7 stars 10 forks source link

incorrect persistent history server uri for dataproc serverless notebook #70

Open nyoungstudios opened 1 year ago

nyoungstudios commented 1 year ago

Error message

Error from Gateway: [Bad Request] failure creating a backend resource: failure starting the kernel creation: failure starting the kernel creation: failure creating session: [400 Bad Request] generic::invalid_argument: com.google.cloud.hadoop.services.common.error.DataprocException: Cluster name 'projects/my-project-id/locations/my-location/clusters/my-phs-cluster-name' must conform to ^(?:/?/?dataproc\.googleapis\.com/)?projects/([^/]+)/regions/([^/]+)/clusters/([^/]+) pattern (INVALID_ARGUMENT)
. Ensure gateway url is valid and the Gateway instance is running.

The error message regex is validating for regions, but the string passed is locations

Steps

Environment

# OS
Ubuntu 20.04 LTS x86_64

# Python version
Python 3.10.13

# Relevant Python dependencies
jupyterlab==4.0.6
dataproc_jupyter_plugin==0.1.9

# output of gcloud version
Google Cloud SDK 448.0.0
beta 2023.09.22
bq 2.0.98
bundled-python3-unix 3.9.16
core 2023.09.22
gsutil 5.25
ywskycn commented 1 year ago

@nyoungstudios, thanks for reporting the issue. I think this has been fixed now. Could you try the latest version? cc @ptwng @outflyer

nyoungstudios commented 1 year ago

@ywskycn I installed the latest package, dataproc-jupyter-plugin==0.1.51 and while it does not throw an error any more when creating the template, the dataproc serverless notebook hangs when starting.

The link at the Interactive Session Details page shows the SPARK HISTORY SERVER which is the correct persistent history server. However, on the logs tab, I see these

Failed to connect to master...
Failed to send RPC RPC ... to gdpic-srvls-session-...-m/...: io.netty.channel.StacklessClosedChannelException
Failed to send ExecutorStateChanged(app-...,0,EXITED,Some(Command exited with code 143),Some(143)) to Master NettyRpcEndpointRef(spark://Master@gdpic-srvls-session-...-m.c.....internal:...), will retry (1/5)."
Failed to send RPC RPC ... to gdpic-srvls-session-...-m/...:...: io.netty.channel.StacklessClosedChannelException
Connection to master failed! Waiting for master to reconnect...
Connection to master failed! Waiting for master to reconnect...
RECEIVED SIGNAL TERM
ywskycn commented 1 year ago

@nyoungstudios this looks like network connectivity issue. Could you help verify your network/firewall configs, and make sure they follow https://cloud.google.com/dataproc-serverless/docs/concepts/network.

nyoungstudios commented 1 year ago

@ywskycn sorry for the delay, we are using a subnet. I believe it is setup correctly as I was able to launch a serverless notebook without the persistent history server using the subnet as well as a serverless batch job with the same persistent history server and the same subnet.

Is there anything else I should be checking?

ywskycn commented 1 year ago

@nyoungstudios to confirm here, so it works well for a serverless interactive session without PHS, but doesn't work for the interactive session with PHS, right? If so, could you help compare detailed configurations between these two interactive sessions? Just to confirm any other difference there. To view session details, you could click through from Jupyter launcher page: "Dataproc Jobs and Sessions" -> "Serverless" -> "SESSIONS".