cBio / cbio-cluster

MSKCC cBio cluster documentation
12 stars 2 forks source link

glibc >2.17? #369

Closed corcra closed 8 years ago

corcra commented 8 years ago

I'm trying to install TensorFlow (https://www.tensorflow.org/versions/0.6.0/get_started/os_setup.html) and it apparently requires glibc >2.17 (https://github.com/tensorflow/tensorflow/issues/53#issuecomment-156575907). Is it possible to get a newer version of glibc (I'm seeing version 2.12), or is there some workaround?

tatarsky commented 8 years ago

There is no workaround to needing a newer version of glibc except docker. We cannot upgrade the system glibc. If they truly do not support the version that comes with 6.X you can consider a docker attempt. If you need to be added to the docker group please advise.

corcra commented 8 years ago

It seems like this glibc dependency is unavoidable, so I will have to go with docker. Can you add me to the docker group? Thanks!

tatarsky commented 8 years ago

You are added to the group on the head node, logout/login. Will take a little bit to propagate to the nodes.

Be very aware of the migration of docker capable kernels in progress and the need to if you get to a qsub situation to be sure to request the docker attribute until that work is done. Long excessive story in #360 but basically I am working on making all nodes docker capable again since the repo change and only ones with that attribute are complete. If this doesn't make sense happy to elaborate.

tatarsky commented 8 years ago

A brief example perhaps of value in making sure you get a docker capable node:

hal> qsub -I -l nodes=1:docker -q active
gpu-1-6$ docker  run ubuntu lsb_release -a
No LSB modules are available.
Distributor ID: Ubuntu
Description:    Ubuntu 14.04.3 LTS
Release:        14.04
Codename:       trusty
gpu-1-6$ exit 

Obviously more to it than that but wanted to show the attribute selection.

corcra commented 8 years ago

Sounds good! I'll take care with qsub.

jchodera commented 8 years ago

+1 on being able to support tensorflow natively. Would be good to figure out what the CentOS upgrade plan is from @juanperin at some point.

lzamparo commented 8 years ago

+1 for eventually leaving CentOS 6, though I'll leave it to @juanperin and @jchodera and @tatarsky to figure out the when.

gideonite commented 8 years ago

Any success with using Docker to run TensorFlow?

There are some stackoverflow threads, namely the one that @corcra linked to, that do suggest hack arounds. My understanding is it involves installing a local glibc of the correct version by extracting from the Ubuntu package. Seems terrible though.

Am I correct that there are no plans to upgrade our glibc version? AFAIU we are running version 2.12 ($ ldd --version), but the latest is 2.22 leaving us at least 5 years out of date.

tatarsky commented 8 years ago

There are no plans to my knowledge to update the base distro (and thus glibc) at this time.

You are welcome to speak to @juanperin on the topic but there is no clean or supported method to update the base glibc without the distro.

Note that CentOS 7 glibc is 2.17.

Docker and its ilk are glorified chroot environments to hack around the matter.

gideonite commented 8 years ago

Yes, I guess that is what Docker and others are for. Hmm. Interesting.

Yes, I've gotten something to run by simply following the TensorFlow instructions. They have a handy Docker build section: https://www.tensorflow.org/versions/r0.7/get_started/os_setup.html#docker-installation.

Have not tested how this works with GPUs.

Thanks for the help.

jchodera commented 8 years ago

There are no plans to my knowledge to update the base distro (and thus glibc) at this time.

There isn't a plan to do this during the three days of cluster downtime?

jchodera commented 8 years ago

I'll ask @juanperin about what his plans are here.

polykrates commented 8 years ago

+1 CentOs Update We talked about this for months and should get started ASAP. I can volunteer the cpath nodes. In practice, docker is a significant overhead and a security risk. CNTK for example, like TensorFlow, demands a newer glibc. Best, Thomas

jchodera commented 8 years ago

I believe @juanperin intends to do this as part of the move of new cluster nodes inside the firewall, but I'm not sure a timeframe has been decided for this yet.