databrickslabs / jupyterlab-integration

DEPRECATED: Integrating Jupyter with Databricks via SSH
Other
71 stars 12 forks source link

dbutils.library support #13

Open vkrot-exos opened 4 years ago

vkrot-exos commented 4 years ago

Hi, Are there any plans to add support for dbutils.library module? Right now simple dbutils.library.help("install") produces an error:

---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
<ipython-input-19-28186010f9f4> in <module>
----> 1 dbutils.library.help("install")

AttributeError: 'DbjlUtils' object has no attribute 'library'

In ML runtime there is also a great magic %pip - https://docs.databricks.com/notebooks/notebooks-python-libraries.html#enable-pip-and-conda-magic-commands It installs libraries both to driver and executor nodes. In contrast, when running %pip install inside jupyterlab notebook connected to databricks cluster - it installs libraries only on driver node. Which makes it unusable in case of udfs, cause executors need same libraries also. Could you suggest any workaround? Or maybe there are some plans to bring such support to jupyterlab-integration?

Any way to install notebook scoped libraries interactively without init scripts?

Thanks in advance

bernhard-42 commented 4 years ago

Hi @vkrot-exos agreed, notebook scoped libraries (dbutils, %pip, %conda) would be amazing and yes, I have it on my roadmap. It just turned out to be not that simple and I need to do more research to find a way to support it ...

At least I know now that there is another person who would love to see them, too :-)

vkrot-exos commented 4 years ago

@bernhard-42 , great to hear! maybe you have some very rough estimates when you're gonna start the research?

now I'm investigating whether to use databricks notebooks or private jupyterhub with this integration library for data scientists. Databricks notebooks are cool, but all notebooks are stored in databricks control plane and are not that well integrated with Github. Using private JupyterHub gives more flexibility in terms of building workflows/fine grained permissions etc. But this %pip, dbutils.libraries feature is really missing.

Do you know, what else is missing in Jupyterlab-integration compared to Databricks Notebooks environment?