elyra-ai / elyra

Elyra extends JupyterLab with an AI centric approach.
https://elyra.readthedocs.io/en/stable/
Apache License 2.0
1.84k stars 344 forks source link

nbconvert requires lxml_clean_html #3228

Closed sjkim2322 closed 1 month ago

sjkim2322 commented 5 months ago

Describe the issue

If you run the notebook component on a runtime image that does not have lxml 5.2.0 or lower or lxml_clean_html installed in advance, the following error occurs.

        "lxml.html.clean module is now a separate project lxml_html_clean.\n"
        "Install lxml[html_clean] or lxml_html_clean directly." 

To Reproduce Steps to reproduce the behavior:

  1. Prepare a Python runtime image of a clean environment (in my case it is “docker.io/python:3.10.12”).
  2. Run the elyra notebook component with the above image set as the runtime image.
  3. The above error occurs while initializing the notebook environment.

Cause I'm guessing

Expected behavior

Deployment information Describe what you've deployed and how:

lresende commented 5 months ago

Should this be in Elyra? Or in nbconvert ?

sjkim2322 commented 5 months ago

@lresende Thank you for answer. I will also raise the issue on nbconvert.

However, I don't know if it is possible to modify the module dependency of a specific version that has already been deployed

In elyra, when the Jupyter notebook component is executed, it appears to download and install the requirements here. Since nbconvert is being installed here, I think it would be good if lxml_clean_html is also added here.

or, If there is a spec that can pre-initialize the notebook component's cell before execution, it may be possible to use it.

lresende commented 5 months ago

As you can see, we already removed a lot of transient dependencies from the requirements file to avoid us having to keep syncing to new versions, etc... if they don't enable that in nbconvert itself, then we can continue this conversation.

sjkim2322 commented 5 months ago

Yes, I left an issue on nbconvert. https://github.com/jupyter/nbconvert/issues/2148

lresende commented 5 months ago

@sjkim2322 based on your nbcovert issue, can't we update the version of nbconvert? does that still work with JupyterLab < 4?

shalberd commented 4 months ago

@lresende @sjkim2322

I can confirm that we can just update the version of nbconvert without any issues regarding runtime behavior with the runtime image.

I can specifically confirm that nbconvert in a higher version works fine with Jupyterlab less than 4. See this code here from Red Hat Open Data Hub folks. They still use Jupyterlab less than 4 as well:

https://github.com/opendatahub-io/notebooks/blob/main/runtimes/minimal/ubi9-python-3.9/utils/requirements-elyra.txt

https://github.com/opendatahub-io/notebooks/blob/main/jupyter/datascience/ubi9-python-3.9/Pipfile

We at our org have this running sucessfully. So yes, you can change to, among other version updates most likely

nbconvert==7.1.0

no compatibility issues with Jupyterlab 3.6.7

See my list of packages in my runtime image / jupyter image (I bake in the packages in the Jupyter image and runtime image combined to make these requirements for runtime elyra available airgapped, no problem.

I think if you sync or update requirements-elyra in this project in line with what is listed up in the link for requirments-elyra of opendatahub-io, you won't have any issues anymore, be it nbconvert or the other runtime requirements packages.

[1001050000@s-testjupyter-0 ~]$ pip list | grep nbconvert
nbconvert                                7.16.4
[1001050000@s-testjupyter-0 ~]$ pip list | grep jupyter
jupyter-bokeh                            3.0.7
jupyter_client                           7.4.9
jupyter_core                             5.7.2
jupyter-events                           0.10.0
jupyter-lsp                              2.2.5
jupyter_packaging                        0.12.3
jupyter-resource-usage                   0.7.2
jupyter_server                           2.14.0
jupyter_server_fileid                    0.9.2
jupyter-server-mathjax                   0.2.6
jupyter_server_proxy                     4.0.0
jupyter_server_terminals                 0.5.3
jupyter_server_ydoc                      0.8.0
jupyter-ydoc                             0.2.5
jupyterlab                               3.6.7
jupyterlab_git                           0.44.0
jupyterlab-lsp                           4.2.0
jupyterlab_pygments                      0.3.0
jupyterlab_server                        2.27.1
jupyterlab-streamlit-menu                0.1.0
jupyterlab_widgets                       3.0.10
# This is a comprehensive list of python dependencies that Elyra requires to execute Jupyter notebooks.
ipykernel = "==6.13.0"
ipython = "==8.10.0"
ipython-genutils = "==0.2.0"
jinja2 = "==3.0.3"
jupyter-client = "==7.3.1"
jupyter-core = "==4.11.2"
MarkupSafe = "==2.1.1"
minio = "==7.1.15"
nbclient = "==0.6.3"
nbconvert = "==7.1.0"
nbformat = "==5.4.0"
papermill = "==2.3.4"
pyzmq = "==24.0.1"
prompt-toolkit = "==3.0.30"
requests = "==2.31.0"
tornado = "==6.3.3"
traitlets = "==5.10.0"
urllib3 = "==1.26.18"

@lresende agreed the runtime and transitive dependencies issue is a little bit of a pain, but as mentioned, lifting the library version should be fine, even with Jupyterlab less than 4. I have tested this running in conjunction with airflow as a runtime engine, executing notebooks in tasks.

shalberd commented 1 month ago

can be easily fixed by setting nbconvert to 7.1.0 https://github.com/jupyter/nbconvert/issues/2148