apache / gravitino-playground

A playground to experience Gravitino
Apache License 2.0
31 stars 31 forks source link

[Enhancement] Install trino and pandas python lib in docker image #63

Open shaofengshi opened 3 months ago

shaofengshi commented 3 months ago

Today in the jupyter notebook for trino, the first step is to instll trino and pandas library, see https://github.com/apache/gravitino-playground/blob/main/init/jupyter/gravitino-trino-example.ipynb. This step needs to access internect, while some users, they may get network problem here, and then block their evaluation. Besides, after execute this step, Jupyter reminds that you need to restart the kernel ("Note: you may need to restart the kernel to use updated packages."), this will bring confusing.

To impove the user experience, we can install these dependencies during build the docker image.

shaofengshi commented 3 months ago

Another dependency in "gravitino-fileset-example.ipynb" is "hdfs" python lib:

"pip install hdfs"

and also gravitino: "pip install gravitino"

shaofengshi commented 3 months ago

I see that the Jupyter notebook is using the "jupyter/minimal-notebook " image, which is not built by our-own, so we are not able to pre-install our dependencies, unless we build a new image.