2i2c-org / utoronto-image

User image for the UToronto Hub
BSD 3-Clause "New" or "Revised" License
0 stars 4 forks source link

Request to install Tensorflow and Keras for Python and R #6

Closed damianavila closed 2 years ago

damianavila commented 2 years ago

This is a request coming from a Freshdesk support ticket: https://2i2c.freshdesk.com/a/tickets/65 We had a conversation in Slack about this one and we decided we are going to implement the update (instead of the user itself) because of the inherent complexity at the time to install those libraries. The timeline and specific versions are not clear yet but I already have requested that information.

damianavila commented 2 years ago

The latest versions are fine according to the requester.

damianavila commented 2 years ago

if it could be installed by the end of this week that would really help these instructors kick off the term

That is the time expectation I got from them.

damianavila commented 2 years ago

Copy-pasting what I have written on Slack:

If we install TensorFlow with pip or conda, it should also bring Keras accordingly to https://keras.io/getting_started/

Then it seems it is just a matter to use the python installation from the R side: https://tensorflow.rstudio.com/installation/custom/#locating-tensorflow

We could eventually ship a .RProfilefile

You could also add the RETICULATE_PYTHON environment variable to your .RProfile.

Or ask them to reference the python installation with use_python() or use_condaenv()

damianavila commented 2 years ago

Opened an exploratory PR at https://github.com/2i2c-org/utoronto-image/pull/7 to build things in several steps.

damianavila commented 2 years ago

Update: I have tested the image I build in #7 in the staging hub under the 2i2c cluster (because I can not easily login into the UoT staging hub) and it seems to be working as expected from the Python side of things: I have built and run a notebook with some basic TF and Keras examples without issue (although with several warnings... it seems both TF and Keras are pretty verbose because you can actually install them in different configurations, ie. CPU vs GPU). I also tested installing the R counterpart on the fly and they seem to work OK as well (so you can also use them from RStudio). I still need to add them to the #7 R requirements, though.

damianavila commented 2 years ago

OK, I tried installing the R requirements (tensorflow and keras R-based version... you actually need those counterparts that use the python stack under the hood) but now I am facing a failing pattern related to the fact I need to bump the version of other R packages (see my latest commits in the https://github.com/2i2c-org/utoronto-image/pull/7). This seems to be an "endless" pattern after several iterations, so I am wondering if others actually faced the same problem at the time to deal with this image. @GeorgianaElena @yuvipanda, did you find the same pattern? If that is the case, did you keep bumping versions until no errors were found? Did you try something else? I am actually quite surprised you can not target older versions of packages coming from cran, is this something specific from the R package management space? Or am I missing something big because I am not knowledgeable enough in that space?

damianavila commented 2 years ago

Doing some research I found this article: https://support.rstudio.com/hc/en-us/articles/219949047-Installing-older-versions-of-packages, which seems to indicate we would need to pass a URL to hit the old packages:

$ git diff
diff --git a/install.R b/install.R
index 78bf356..77d5b8d 100755
--- a/install.R
+++ b/install.R
@@ -116,7 +116,8 @@ github_packages <- c(
 for (i in seq(1, length(cran_packages), 2)) {
   devtools::install_version(
     cran_packages[i],
-    version = cran_packages[i + 1]
+    version = cran_packages[i + 1],
+    repos = "http://cran.us.r-project.org"
   )
 }

But in that case, I would be worried about future incompatibilities (as the article indicated at the end):

Potential issues There are a few potential issues that may arise with installing older versions of packages: - You may be losing functionality or bug fixes that are only present in the newer versions of the packages. - The older package version needed may not be compatible with the version of R you have installed. In this case, you will either need to downgrade R to a compatible version or update your R code to work with a newer version of the package.

GeorgianaElena commented 2 years ago

@damianavila, yes, I believe I kept bumped versions until it worked. Although I don't remember going into this many steps :(

version = cran_packages[i + 1], repos = "http://cran.us.r-project.org"

So the place where the R pkgs get installed from is the Rstudio pkg manager:

https://github.com/2i2c-org/utoronto-image/blob/fa7eef0a6812814f7e488c9019b8e43b1ed80e0b/rsession.conf#L1-L2

And I believe that's because it can provide binary pkgs rather than just from source files.

damianavila commented 2 years ago

Thanks for the additional context, @GeorgianaElena!

It seems the Rstudio pkg manager actually offers a way to "freeze" the package set you can fetch from but that would not play well with new libraries (when we need to install new ones).

I will try to sync the packages we currently have in file with the latest one from the RStudio package manager and see how that goes...

damianavila commented 2 years ago

Independently of how it goes, I think we need to re-think how we are creating and maintaining the R environment because otherwise, we are going to have this version jumps in multiple packages any time we might need to add some new dependencies and that could be an unstable territory from the user perspective, IMHO.

damianavila commented 2 years ago

Btw, for future readers, these are the related PRs workarounding the issue at the time to test in Binder (not possible) and the repo2docker timeout (I have already merged both those along the way):

damianavila commented 2 years ago

The last try (syncing all the versions) seemed to work. I was able to build a test image and tested some basic TF and Keras commands and it seems to work. I have asked @GeorgianaElena to deploy the test image in UoT staging and I will ask the requester to test it there and provide feedback before promoting it to production through the UoT hub config file (on the PR is merged and successfully built).

yuvipanda commented 2 years ago

Done! Thanks a lot, @damianavila!

damianavila commented 2 years ago

Thanks for the merge, @yuvipanda. Additional context: the requesters provided positive feedback about the image on staging (https://2i2c.freshdesk.com/a/tickets/57).

yuvipanda commented 2 years ago

Unfortunately actually attempting to use these libraries doesn't work:

image

If you set Sys.setenv(RETICULATE_PYTHON="/opt/conda/bin/python") temporarily, you now get a different error:

 Error in py_call_impl(callable, dots$args, dots$keywords) : 
  Exception: URL fetch failure on https://storage.googleapis.com/tensorflow/tf-keras-datasets/mnist.npz: None -- unknown url type: https

which makes me suspect and fear that the issue is that R and python are using different openssl libraries...

damianavila commented 2 years ago

I was able to build a test image and tested some basic TF and Keras commands and it seems to work.

I did not need to add the RETICULATE_PYTHON before... not sure why this is not working now, although I suspect the image is actually a little bit different now since the last time I tested.

damianavila commented 2 years ago

Update of the whole situation (and also draft to be posted to the Jupyter discourse forum):

Openssl mismatch between RStudio and conda environments

tl;dr

RStudio comes bundled with its own system version of OpenSSL. Conda also installs OpenSSL via some packages. If you use RStudio to run a conda-installed package that calls OpenSSL, there is a good chance that it won't work due to an OpenSSL version mis-match. This is because RStudio forces the use of a system version of OpenSSL, while conda expects its own version of OpenSSL. To fix it, either call the function that requires OpenSSL from a Jupyter interface, or separate your conda and RStudio environments entirely.

Introduction

Recently, 2i2c received a request to install Tensorflow and Keras in an image containing conda environments along wit several R packages, including RStudio: https://github.com/2i2c-org/utoronto-image.

We were able to install the python TensorFlow package and the R counterparts as instructed by the corresponding documentation.

We also needed to set up the RETICULATE_PYTHON environment variable so the R packages could properly find the python ones: https://github.com/2i2c-org/utoronto-image/blob/main/Rprofile.site#L11

Problem

Our users began reporting issues when trying to download example datasets from within RStudio. For example:

Error in py_call_impl(callable, dots$args, dots$keywords) : 
  Exception: URL fetch failure on https://storage.googleapis.com/tensorflow/tf-keras-datasets/mnist.npz: None -- unknown url type: https

which suggested some underlying openssl-related issues.

Investigation

Upon several rounds of debugging sessions, we have found that RStudio seems to load the "system" OpenSSL libraries when it is opened. For example:

$ ldd /usr/lib/rstudio-server/bin/rserver | grep ssl        libssl.so.1.1 => /usr/lib/x86_64-linux-gnu/libssl.so.1.1 (0x00007f0413833000)

and the "system" version is 1.1.1f

$ dpkg -l | grep openssl
ii  libcurl4-openssl-dev:amd64           7.68.0-1ubuntu2.7                   amd64        development files and documentation for libcurl (OpenSSL flavour)
ii  openssl                              1.1.1f-1ubuntu2.12                  amd64        Secure Sockets Layer toolkit - cryptographic utility

But the conda main environment actually has another version, 1.1.1l:

$ conda list openssl
# packages in environment at /opt/conda:
#
# Name                    Version                   Build  Channel
openssl                   1.1.1l               h7f98852_0    conda-forge
pyopenssl                 19.1.0                     py_1    conda-forge
$ openssl version
OpenSSL 1.1.1l  24 Aug 2021

Our hypothesis for what is happening:

To test if this specific mismatch was causing the problem, we tried with a symbolic link hack:

mv /opt/conda/lib/libssl.so.1.1 /opt/conda/lib/libssl.so.1.1.backup
ln -s /usr/lib/x86_64-linux-gnu/libssl.so.1.1 /opt/conda/lib/libssl.so.1.1

and then, it worked!

Screen Shot 2022-03-22 at 12 01 33

This confirmed our suspicion, but is likely not a long-term solution because it is probably a very brittle fix that will break in unexpected ways.

Things we have tried (and it did not work!)

First, we thought about syncing the openssl versions in both environments ("system" and conda): https://github.com/2i2c-org/utoronto-image/pull/28. But that approach did not work!

Then we tried pointing the LD_LIBRARY_PATH environment variable to the conda-specific openssl-related paths so "force" RStudio to load the expected openssl libraries but that approach also failed: https://github.com/2i2c-org/utoronto-image/pull/29 and triggered other potential issues!

Possible workarounds

After spending a lot of hours on this issue we finally decided to stop trying, look for reasonable workarounds, and post here to disseminate the information we collected.

We have verified the openssl mismatch does NOT happen when you use the Jupyter Notebook application with the R-kernel. So the problem seems to be an RStudio-specific issue when you have multiple co-existing environments (most likely caused by RStudio somehow loading the "system" openssl libraries instead of the conda one). Hence, an immediate workaround is use a Jupyter interface to download the dataset and then return to RStudio for the rest of your task.

Another alternative would be to create a different image without a conda environment to run your RStudio workflows, so that any python package (including TF or Keras) actually uses the "system" openssl library instead of a conflicting one.

There might be other options involving fixes/enhancements at the RStudio level, but this is outside of our expertise to fix. If others have experience with RStudio and an idea for how to resolve this, please share your ideas!

Eager to hear from you if you have any thoughts or if you faced this very same problem (even if you did not solve it ;-).

Hopefully, all this information is useful for future readers!

damianavila commented 2 years ago

@choldgraf, this is the draft for the Jupyter discourse post we talked about yesterday. Feel free to edit it as you wish!

@2i2c-org/tech-team, feel free to comment about the draft as well!

choldgraf commented 2 years ago

@damianavila I added a few quick edits above. I think in general it looks good to me! Quick thought on structure, but I think we can probably send it off quickly after that:

I think that the post should go from "most general" to "most technical". Most people are going to start reading at the top and lose steam by the time they read 100-150 words, unless they are very motivated. So I think we should put the most actionable and important stuff at the top. To that extent, I'd structure it like:

I think the stuff like "confirming which version of OpenSSL" is really nice, but doesn't answer the question people will have of "ok but what do I actually do about this?". So I think we could put that information at the bottom for people who really want to learn more.

damianavila commented 2 years ago

Thanks for the feedback, @choldgraf!

I'd structure it like

OK, I will try your structure and ping you back again when it is ready so you can quickly look at it before posting it.

damianavila commented 2 years ago

@choldgraf, I think this new layout adheres to your last request. Can you take another look? (and feel free to make edits).


Openssl mismatch between RStudio and conda environments

tl;dr

RStudio uses the "system" version of OpenSSL. Conda also installs OpenSSL. If you use RStudio to run a conda-installed package that calls OpenSSL, there is a good chance that it won't work due to an OpenSSL "mismatch". This is because RStudio forces the use of a system version of OpenSSL, while conda expects its own version of OpenSSL. To fix it, either call the function that requires OpenSSL from a Jupyter interface, or separate your conda and RStudio environments entirely.

Introduction

Recently, 2i2c received a request to install Tensorflow and Keras in an image containing conda environments along wit several R packages, including RStudio: https://github.com/2i2c-org/utoronto-image.

We were able to install the python TensorFlow package and the R counterparts as instructed by the corresponding documentation.

We also needed to set up the RETICULATE_PYTHON environment variable so the R packages could properly find the python ones: https://github.com/2i2c-org/utoronto-image/blob/main/Rprofile.site#L11

Problem

Our users began reporting issues when trying to download example datasets from within RStudio. For example:

Error in py_call_impl(callable, dots$args, dots$keywords) : 
  Exception: URL fetch failure on https://storage.googleapis.com/tensorflow/tf-keras-datasets/mnist.npz: None -- unknown url type: https

which suggested some underlying OpenSSL-related issues.

Investigation

Upon several rounds of debugging sessions, we have found that RStudio seems to load the "system" OpenSSL libraries when it is opened. Our hypothesis for what is happening:

To test if this specific mismatch was causing the problem, we tried a symbolic link hack:

mv /opt/conda/lib/libssl.so.1.1 /opt/conda/lib/libssl.so.1.1.backup
ln -s /usr/lib/x86_64-linux-gnu/libssl.so.1.1 /opt/conda/lib/libssl.so.1.1

and then, it worked!

Screen Shot 2022-03-22 at 12 01 33

This confirmed our suspicion but is likely not a long-term solution because it is probably a very brittle fix that will break in unexpected ways.

Possible workarounds

After spending a lot of hours on this issue we finally decided to stop trying, look for reasonable workarounds, and post here to disseminate the information we collected.

We have verified the OpenSSL mismatch does NOT happen when you use the Jupyter Notebook application with the R-kernel. So the problem seems to be an RStudio-specific issue when you have multiple co-existing environments (most likely caused by RStudio somehow loading the "system" OpenSSL libraries instead of the conda one). Hence, an immediate workaround is to use a Jupyter interface to download the dataset and then return to RStudio for the rest of your task.

Another alternative would be to create a different image without a conda environment to run your RStudio workflows, so that any python package (including TF or Keras) actually uses the "system" OpenSSL library instead of a conflicting one.

There might be other options involving fixes/enhancements at the RStudio level, but this is outside of our expertise to fix. If others have experience with RStudio and an idea for how to resolve this, please share your ideas!

Hopefully, all this information is useful for future readers!

Appendix

Things we have tried (and it did not work!)

First, we thought about syncing the OpenSSL versions in both environments ("system" and conda): https://github.com/2i2c-org/utoronto-image/pull/28. But that approach did NOT work!

Then we tried pointing the LD_LIBRARY_PATH environment variable to conda-specific OpenSSL-related paths (to "force" RStudio to load the expected OpenSSL libraries) but that approach also failed: https://github.com/2i2c-org/utoronto-image/pull/29 and triggered other potential issues!

Checking OpenSSL versions

To check the OpenSSL "system" version being used, you used the ldd command:

$ ldd /usr/lib/rstudio-server/bin/rserver | grep ssl        libssl.so.1.1 => /usr/lib/x86_64-linux-gnu/libssl.so.1.1 (0x00007f0413833000)

and the "system" version was 1.1.1f. We also confirmed the installed version with the dpkg -l command:

$ dpkg -l | grep openssl
ii  libcurl4-openssl-dev:amd64           7.68.0-1ubuntu2.7                   amd64        development files and documentation for libcurl (OpenSSL flavour)
ii  openssl                              1.1.1f-1ubuntu2.12                  amd64        Secure Sockets Layer toolkit - cryptographic utility

To check the OpenSSL conda-associated version, we listed the openssl conda package and also directly checked the version:

$ conda list openssl
# packages in environment at /opt/conda:
#
# Name                    Version                   Build  Channel
openssl                   1.1.1l               h7f98852_0    conda-forge
pyopenssl                 19.1.0                     py_1    conda-forge
$ openssl version
OpenSSL 1.1.1l  24 Aug 2021

and the conda main environment had version 1.1.1l.

damianavila commented 2 years ago

Jupyter Discourse post was published here: https://discourse.jupyter.org/t/openssl-mismatch-between-rstudio-and-conda-environments/14123

damianavila commented 2 years ago

Pinged Nathan on the ticket: https://2i2c.freshdesk.com/a/tickets/65?note=80136454078.

Closing this one now.