PKUHPC / OpenSCOW

Super Computing On Web
https://www.pkuscow.com/
Mulan Permissive Software License, Version 2
219 stars 49 forks source link

[Bug/Help] 创建JupyterLab任务后,无法连接 #1449

Closed netcat-fan closed 1 month ago

netcat-fan commented 1 month ago

是否已有关于该错误的issue或讨论? | Is there an existing issue / discussion for this?

发生了什么 | What happened

创建完JupyterLab任务后,连接 为灰色不可点击,提示应用还未准备好 image

期望结果 | What did you expect to happen

No response

之前运行正常吗? | Did this work before?

OpenSCOW 版本1.6.2 以前运行是正常的,点击连接后在浏览器打开jupyterlab页面

复现方法 | Steps To Reproduce

No response

运行环境 | Environment

- OS:CentOS Linux release 7.9.2009
- Scheduler:
- Docker:
- Docker-compose:
- SCOW cli:1.6.2
- SCOW:
- Adapter:v1.6.0

备注 | Anything else?

配置文件config/apps/jupyterlab.yaml

# 这个应用的ID
id: jupyterlab

# 这个应用的名字
name: jupyterlab

# logo
logoPath: /apps/jupyter.png

# 指定应用类型为web
type: web

# Web应用的配置
web:
  # 指定反向代理类型
  proxyType: absolute
  # 准备脚本
  beforeScript: |
    export PORT=$(get_port)
    export PASSWORD=$(get_password 12)
    export SALT= xxx
    export PASSWORD_SHA1="$(echo -n "${PASSWORD}${SALT}" | openssl dgst -sha1 | awk '{print $NF}')"
    export CONFIG_FILE="${PWD}/config.py"
    export SLURM_COMPUTE_NODE_HOSTNAME=$(hostname)
    export SHELL_NAME=$(echo ${SHELL} | awk -F'/' '{print $NF}')
    export CONDA_VERSION="anaconda/3-2024.06"

  # 运行任务的脚本。可以使用准备脚本定义的变量
  script: |

    # 加载需要的module环境
    for m in ${textModuleName}; do module switch ${m}; done

    conda -V &> /dev/null
    if [ $? -ne 0 ]; then
      module switch ${CONDA_VERSION}
    fi
    # init conda
    eval "$($(which conda) shell.${SHELL_NAME} hook)"

    if [[ "" == "${textCondaName}" ]]; then
      textCondaName="base"
    fi
    conda activate ${textCondaName}
    if [ $? -ne 0 ]; then
      exit 1
    fi

    (
    umask 077
    cat > "${CONFIG_FILE}" << EOL
    c.NotebookApp.ip = '0.0.0.0'
    c.NotebookApp.port = ${PORT}
    c.NotebookApp.port_retries = 0
    c.NotebookApp.password = u'sha1:${SALT}:${PASSWORD_SHA1}'
    c.NotebookApp.open_browser = False
    c.NotebookApp.base_url = "${PROXY_BASE_PATH}/${SLURM_COMPUTE_NODE_HOSTNAME}/${PORT}/"
    c.NotebookApp.allow_origin = '*'
    c.NotebookApp.disable_check_xsrf = True
    EOL
    )
    cd ~
    jupyter-lab --config=${CONFIG_FILE} --notebook-dir=${HOME}

  # 如何连接应用
  connect:
    method: POST
    path: /login
    formData:
      password: "{{ PASSWORD }}"

# 配置HTML表单
attributes:
  - type: text
    name: textModuleName
    label: Modules
    required: false  # 输入需要额外加载的环境模块列表
    placeholder: 输入需要额外加载的环境模块列表,模块之间用空格分开(比如:python/2.7.5 code-server/4.9.1)  # 提示信息
  - type: text
    name: textCondaName
    label: conda环境
    required: false  # 输入运行Jupyter的conda环境,默认使用base环境
    placeholder: 输入conda虚拟环境名称  # 提示信息
  - type: text
    name: sbatchOptions
    label: 其他sbatch参数
    required: false
    placeholder: "比如:--gpus gres:2 --time 10"

任务输出文件有报错No web browser found: Error('could not locate runnable browser')

[W 2024-10-22 16:05:25.584 ServerApp] A `_jupyter_server_extension_points` function was not found in jupyter_lsp. Instead, a `_jupyter_server_extension_paths` function was found and will be used for now. This function name will be deprecated in future releases of Jupyter Server.
[W 2024-10-22 16:05:25.597 ServerApp] A `_jupyter_server_extension_points` function was not found in notebook_shim. Instead, a `_jupyter_server_extension_paths` function was found and will be used for now. This function name will be deprecated in future releases of Jupyter Server.
[I 2024-10-22 16:05:26.383 ServerApp] Extension package panel.io.jupyter_server_extension took 0.7853s to import
[I 2024-10-22 16:05:26.384 ServerApp] jupyter_lsp | extension was successfully linked.
[I 2024-10-22 16:05:26.387 ServerApp] jupyter_server_terminals | extension was successfully linked.
[W 2024-10-22 16:05:26.388 LabApp] 'ip' has moved from NotebookApp to ServerApp. This config will be passed to ServerApp. Be sure to update your config before our next release.
[W 2024-10-22 16:05:26.388 LabApp] 'port' has moved from NotebookApp to ServerApp. This config will be passed to ServerApp. Be sure to update your config before our next release.
[W 2024-10-22 16:05:26.388 LabApp] 'port_retries' has moved from NotebookApp to ServerApp. This config will be passed to ServerApp. Be sure to update your config before our next release.
[W 2024-10-22 16:05:26.388 LabApp] 'password' has moved from NotebookApp to ServerApp. This config will be passed to ServerApp. Be sure to update your config before our next release.
[W 2024-10-22 16:05:26.388 LabApp] 'password' has moved from NotebookApp to ServerApp. This config will be passed to ServerApp. Be sure to update your config before our next release.
[W 2024-10-22 16:05:26.388 LabApp] 'base_url' has moved from NotebookApp to ServerApp. This config will be passed to ServerApp. Be sure to update your config before our next release.
[W 2024-10-22 16:05:26.388 LabApp] 'allow_origin' has moved from NotebookApp to ServerApp. This config will be passed to ServerApp. Be sure to update your config before our next release.
[W 2024-10-22 16:05:26.388 LabApp] 'disable_check_xsrf' has moved from NotebookApp to ServerApp. This config will be passed to ServerApp. Be sure to update your config before our next release.
[W 2024-10-22 16:05:26.390 ServerApp] ServerApp.password config is deprecated in 2.0. Use PasswordIdentityProvider.hashed_password.
[I 2024-10-22 16:05:26.390 ServerApp] jupyterlab | extension was successfully linked.
[I 2024-10-22 16:05:26.393 ServerApp] notebook | extension was successfully linked.
[I 2024-10-22 16:05:26.713 ServerApp] notebook_shim | extension was successfully linked.
[I 2024-10-22 16:05:26.713 ServerApp] panel.io.jupyter_server_extension | extension was successfully linked.
[I 2024-10-22 16:05:26.736 ServerApp] notebook_shim | extension was successfully loaded.
[I 2024-10-22 16:05:26.738 ServerApp] jupyter_lsp | extension was successfully loaded.
[I 2024-10-22 16:05:26.739 ServerApp] jupyter_server_terminals | extension was successfully loaded.
[I 2024-10-22 16:05:26.743 LabApp] JupyterLab extension loaded from /data/software/anaconda/3-2024.06/lib/python3.12/site-packages/jupyterlab
[I 2024-10-22 16:05:26.743 LabApp] JupyterLab application directory is /data/software/anaconda/3-2024.06/share/jupyter/lab
[I 2024-10-22 16:05:26.743 LabApp] Extension Manager is 'pypi'.
[I 2024-10-22 16:05:26.745 ServerApp] jupyterlab | extension was successfully loaded.
[I 2024-10-22 16:05:26.749 ServerApp] notebook | extension was successfully loaded.
[I 2024-10-22 16:05:26.750 ServerApp] panel.io.jupyter_server_extension | extension was successfully loaded.
[I 2024-10-22 16:05:26.750 ServerApp] Serving notebooks from local directory: /data/home/gpu00009336
[I 2024-10-22 16:05:26.750 ServerApp] Jupyter Server 2.14.1 is running at:
[I 2024-10-22 16:05:26.750 ServerApp] http://slurm_compute01:36857/api/proxy/neu_gpu/absolute/slurm_compute01/36857/lab
[I 2024-10-22 16:05:26.750 ServerApp]     http://127.0.0.1:36857/api/proxy/neu_gpu/absolute/slurm_compute01/36857/lab
[I 2024-10-22 16:05:26.750 ServerApp] Use Control-C to stop this server and shut down all kernels (twice to skip confirmation).
[W 2024-10-22 16:05:26.759 ServerApp] No web browser found: Error('could not locate runnable browser').
[I 2024-10-22 16:05:26.809 ServerApp] Skipped non-installed server(s): bash-language-server, dockerfile-language-server-nodejs, javascript-typescript-langserver, jedi-language-server, julia-language-server, pyright, python-language-server, r-languageserver, sql-language-server, texlab, typescript-language-server, unified-language-server, vscode-css-languageserver-bin, vscode-html-languageserver-bin, vscode-json-languageserver-bin, yaml-language-server
slurmstepd: error: *** JOB 77 ON slurm_compute01 CANCELLED AT 2024-10-22T16:15:46 DUE TO TIME LIMIT ***
piccaSun commented 1 month ago

您好,请问之前可以直接打开的情况是相同的配置文件没有任何改动吗?

在您的描述中,当前[连接]按钮不可点击的原因应该是无法判断应用服务所在端口是否开放 请您点击[进入目录]确认是否正常生成了server_session_info.json 并写入了正确的HOST PORT等信息

netcat-fan commented 1 month ago

您好,请问之前可以直接打开的情况是相同的配置文件没有任何改动吗?

在您的描述中,当前[连接]按钮不可点击的原因应该是无法判断应用服务所在端口是否开放 请您点击[进入目录]确认是否正常生成了server_session_info.json 并写入了正确的HOST PORT等信息

没改过配置文件,server_session_info.json 文件也有信息

{"HOST":"slurm_compute01","PORT":36857,"PASSWORD":"*******"}

从登录节点测试到计算节点的对应端口也是通的

piccaSun commented 1 month ago

您好,感谢反馈 这个No web browser found不影响jupyterlab的启动 下面是一个正常启动的jupyterlab。 image 能请您继续等待交互式应用让系统多执行几次检查连接看看是否可以连接上吗?Jupyterlab有很多自己的重定向规则,可能连接会比较慢,如果还连不上您可以继续关注之后还有没有更详细的log输出

netcat-fan commented 1 month ago

明白了,谢谢 我把scow的容器重启了一遍,现在好了