Closed siddas27 closed 5 years ago
Thanks for filing this issue.
With pip installed TensorFlow with GPU support (tensorflow-gpu
), the install of this module was switching the implementation to tensorflow
which is CPU only. To fix your environment you will need to fix up the pip modules so tensorflow-gpu is there and picked up rather than the CPU only version of TensorFlow that got pulled in by the install.
I have fixed this issue by changing the prerequisites to call out tensorflow-gpu
. I did not hit this issue in my testing because I was using a conda installed TensorFlow. This issue does not recreate in that environment:
$ conda install tensorflow-gpu=1.13.1
# packages installed:
$ conda list | grep tensor
tensorboard 1.13.1 py36hf484d3e_0
tensorflow 1.13.1 gpu_py36h3991807_0
tensorflow-base 1.13.1 gpu_py36h8d69cac_0
tensorflow-estimator 1.13.0 py_0
tensorflow-gpu 1.13.1 h0d30ee6_0
When the tensorflow-large-model support is pip installed into a conda environment like this it does not mess up the tensorflow-gpu.
Thank you for your answer.@smatzek
I created a environment with conda and I also install tensorflow 1.15.0 with conda. However, after I run pip install ./tensorflow-large-model-support
my tensorflows were replaced by new version.
$ pip ./tensorflow-large-model-support/ ERROR: unknown command "./tensorflow-large-model-support/" (py37) jjia@res-hpc-lo98:/exports/lkeb-hpc/jjia/project/e2e_new$ pip install ./tensorflow-large-model-support/ Processing ./tensorflow-large-model-support Collecting tensorflow-gpu>=1.5 Downloading https://files.pythonhosted.org/packages/a1/eb/bc0784af18f612838f90419cf4805c37c20ddb957f5ffe0c42144562dcfa/tensorflow_gpu-2.0.0-cp37-cp37m-manylinux2010_x86_64.whl (380.8MB) |████████████████████████████████| 380.8MB 19kB/s Collecting toposort>=1.5 Using cached https://files.pythonhosted.org/packages/e9/8a/321cd8ea5f4a22a06e3ba30ef31ec33bea11a3443eeb1d89807640ee6ed4/toposort-1.5-py2.py3-none-any.whl Requirement already satisfied: grpcio>=1.8.6 in /exports/lkeb-hpc/jjia/software/anaconda3/envs/py37/lib/python3.7/site-packages (from tensorflow-gpu>=1.5->tensorflow-large-model-support==0.1.0) (1.16.1) Requirement already satisfied: astor>=0.6.0 in /exports/lkeb-hpc/jjia/software/anaconda3/envs/py37/lib/python3.7/site-packages (from tensorflow-gpu>=1.5->tensorflow-large-model-support==0.1.0) (0.8.0) Requirement already satisfied: six>=1.10.0 in /exports/lkeb-hpc/jjia/software/anaconda3/envs/py37/lib/python3.7/site-packages (from tensorflow-gpu>=1.5->tensorflow-large-model-support==0.1.0) (1.13.0) Requirement already satisfied: keras-applications>=1.0.8 in /exports/lkeb-hpc/jjia/software/anaconda3/envs/py37/lib/python3.7/site-packages (from tensorflow-gpu>=1.5->tensorflow-large-model-support==0.1.0) (1.0.8) Requirement already satisfied: numpy<2.0,>=1.16.0 in /exports/lkeb-hpc/jjia/software/anaconda3/envs/py37/lib/python3.7/site-packages (from tensorflow-gpu>=1.5->tensorflow-large-model-support==0.1.0) (1.17.4) Requirement already satisfied: protobuf>=3.6.1 in /exports/lkeb-hpc/jjia/software/anaconda3/envs/py37/lib/python3.7/site-packages (from tensorflow-gpu>=1.5->tensorflow-large-model-support==0.1.0) (3.11.2) Collecting tensorflow-estimator<2.1.0,>=2.0.0 Using cached https://files.pythonhosted.org/packages/fc/08/8b927337b7019c374719145d1dceba21a8bb909b93b1ad6f8fb7d22c1ca1/tensorflow_estimator-2.0.1-py2.py3-none-any.whl Requirement already satisfied: wrapt>=1.11.1 in /exports/lkeb-hpc/jjia/software/anaconda3/envs/py37/lib/python3.7/site-packages (from tensorflow-gpu>=1.5->tensorflow-large-model-support==0.1.0) (1.11.2) Requirement already satisfied: opt-einsum>=2.3.2 in /exports/lkeb-hpc/jjia/software/anaconda3/envs/py37/lib/python3.7/site-packages (from tensorflow-gpu>=1.5->tensorflow-large-model-support==0.1.0) (3.1.0) Requirement already satisfied: google-pasta>=0.1.6 in /exports/lkeb-hpc/jjia/software/anaconda3/envs/py37/lib/python3.7/site-packages (from tensorflow-gpu>=1.5->tensorflow-large-model-support==0.1.0) (0.1.8) Requirement already satisfied: absl-py>=0.7.0 in /exports/lkeb-hpc/jjia/software/anaconda3/envs/py37/lib/python3.7/site-packages (from tensorflow-gpu>=1.5->tensorflow-large-model-support==0.1.0) (0.8.1) Requirement already satisfied: wheel>=0.26 in /exports/lkeb-hpc/jjia/software/anaconda3/envs/py37/lib/python3.7/site-packages (from tensorflow-gpu>=1.5->tensorflow-large-model-support==0.1.0) (0.33.6) Requirement already satisfied: keras-preprocessing>=1.0.5 in /exports/lkeb-hpc/jjia/software/anaconda3/envs/py37/lib/python3.7/site-packages (from tensorflow-gpu>=1.5->tensorflow-large-model-support==0.1.0) (1.1.0) Requirement already satisfied: gast==0.2.2 in /exports/lkeb-hpc/jjia/software/anaconda3/envs/py37/lib/python3.7/site-packages (from tensorflow-gpu>=1.5->tensorflow-large-model-support==0.1.0) (0.2.2) Requirement already satisfied: termcolor>=1.1.0 in /exports/lkeb-hpc/jjia/software/anaconda3/envs/py37/lib/python3.7/site-packages (from tensorflow-gpu>=1.5->tensorflow-large-model-support==0.1.0) (1.1.0) Collecting tensorboard<2.1.0,>=2.0.0 Downloading https://files.pythonhosted.org/packages/76/54/99b9d5d52d5cb732f099baaaf7740403e83fe6b0cedde940fabd2b13d75a/tensorboard-2.0.2-py3-none-any.whl (3.8MB) |████████████████████████████████| 3.8MB 29.1MB/s Requirement already satisfied: h5py in /exports/lkeb-hpc/jjia/software/anaconda3/envs/py37/lib/python3.7/site-packages (from keras-applications>=1.0.8->tensorflow-gpu>=1.5->tensorflow-large-model-support==0.1.0) (2.9.0) Requirement already satisfied: setuptools in /exports/lkeb-hpc/jjia/software/anaconda3/envs/py37/lib/python3.7/site-packages (from protobuf>=3.6.1->tensorflow-gpu>=1.5->tensorflow-large-model-support==0.1.0) (42.0.2.post20191203) Collecting requests<3,>=2.21.0 Using cached https://files.pythonhosted.org/packages/51/bd/23c926cd341ea6b7dd0b2a00aba99ae0f828be89d72b2190f27c11d4b7fb/requests-2.22.0-py2.py3-none-any.whl Collecting google-auth-oauthlib<0.5,>=0.4.1 Using cached https://files.pythonhosted.org/packages/7b/b8/88def36e74bee9fce511c9519571f4e485e890093ab7442284f4ffaef60b/google_auth_oauthlib-0.4.1-py2.py3-none-any.whl Collecting google-auth<2,>=1.6.3 Using cached https://files.pythonhosted.org/packages/36/f8/84b5771faec3eba9fe0c91c8c5896364a8ba08852c0dea5ad2025026dd95/google_auth-1.10.0-py2.py3-none-any.whl Requirement already satisfied: werkzeug>=0.11.15 in /exports/lkeb-hpc/jjia/software/anaconda3/envs/py37/lib/python3.7/site-packages (from tensorboard<2.1.0,>=2.0.0->tensorflow-gpu>=1.5->tensorflow-large-model-support==0.1.0) (0.16.0) Requirement already satisfied: markdown>=2.6.8 in /exports/lkeb-hpc/jjia/software/anaconda3/envs/py37/lib/python3.7/site-packages (from tensorboard<2.1.0,>=2.0.0->tensorflow-gpu>=1.5->tensorflow-large-model-support==0.1.0) (3.1.1) Collecting idna<2.9,>=2.5 Using cached https://files.pythonhosted.org/packages/14/2c/cd551d81dbe15200be1cf41cd03869a46fe7226e7450af7a6545bfc474c9/idna-2.8-py2.py3-none-any.whl Requirement already satisfied: certifi>=2017.4.17 in /exports/lkeb-hpc/jjia/software/anaconda3/envs/py37/lib/python3.7/site-packages (from requests<3,>=2.21.0->tensorboard<2.1.0,>=2.0.0->tensorflow-gpu>=1.5->tensorflow-large-model-support==0.1.0) (2019.11.28) Collecting urllib3!=1.25.0,!=1.25.1,<1.26,>=1.21.1 Using cached https://files.pythonhosted.org/packages/b4/40/a9837291310ee1ccc242ceb6ebfd9eb21539649f193a7c8c86ba15b98539/urllib3-1.25.7-py2.py3-none-any.whl Collecting chardet<3.1.0,>=3.0.2 Using cached https://files.pythonhosted.org/packages/bc/a9/01ffebfb562e4274b6487b4bb1ddec7ca55ec7510b22e4c51f14098443b8/chardet-3.0.4-py2.py3-none-any.whl Collecting requests-oauthlib>=0.7.0 Using cached https://files.pythonhosted.org/packages/a3/12/b92740d845ab62ea4edf04d2f4164d82532b5a0b03836d4d4e71c6f3d379/requests_oauthlib-1.3.0-py2.py3-none-any.whl Collecting rsa<4.1,>=3.1.4 Using cached https://files.pythonhosted.org/packages/02/e5/38518af393f7c214357079ce67a317307936896e961e35450b70fad2a9cf/rsa-4.0-py2.py3-none-any.whl Collecting cachetools<5.0,>=2.0.0 Downloading https://files.pythonhosted.org/packages/08/6a/abf83cb951617793fd49c98cb9456860f5df66ff89883c8660aa0672d425/cachetools-4.0.0-py3-none-any.whl Collecting pyasn1-modules>=0.2.1 Using cached https://files.pythonhosted.org/packages/52/50/bb4cefca37da63a0c52218ba2cb1b1c36110d84dcbae8aa48cd67c5e95c2/pyasn1_modules-0.2.7-py2.py3-none-any.whl Collecting oauthlib>=3.0.0 Using cached https://files.pythonhosted.org/packages/05/57/ce2e7a8fa7c0afb54a0581b14a65b56e62b5759dbc98e80627142b8a3704/oauthlib-3.1.0-py2.py3-none-any.whl Collecting pyasn1>=0.1.3 Using cached https://files.pythonhosted.org/packages/62/1e/a94a8d635fa3ce4cfc7f506003548d0a2447ae76fd5ca53932970fe3053f/pyasn1-0.4.8-py2.py3-none-any.whl Building wheels for collected packages: tensorflow-large-model-support Building wheel for tensorflow-large-model-support (setup.py) ... done Created wheel for tensorflow-large-model-support: filename=tensorflow_large_model_support-0.1.0-cp37-none-any.whl size=17270 sha256=75a236618f321f6b8b3b0d44593c725b52e3fbcdef78ca10ed33664ca7b8e20f Stored in directory: /home/jjia/.cache/pip/wheels/69/41/8c/b952f45ccd8fa39a5d75be005bc14f5d32d37cb57fc5c85513 Successfully built tensorflow-large-model-support
`ERROR: tensorflow 1.15.0 has requirement tensorboard<1.16.0,>=1.15.0, but you'll have tensorboard 2.0.2 which is incompatible. '
'ERROR: tensorflow 1.15.0 has requirement tensorflow-estimator==1.15.1, but you'll have tensorflow-estimator 2.0.1 which is incompatible. '
'ERROR: tensorboard 2.0.2 has requirement grpcio>=1.24.3, but you'll have grpcio 1.16.1 which is incompatible.`
Installing collected packages: tensorflow-estimator, idna, urllib3, chardet, requests, oauthlib, requests-oauthlib, pyasn1, rsa, cachetools, pyasn1-modules, google-auth, google-auth-oauthlib, tensorboard, tensorflow-gpu, toposort, tensorflow-large-model-support Found existing installation: tensorflow-estimator 1.15.1 Uninstalling tensorflow-estimator-1.15.1: Successfully uninstalled tensorflow-estimator-1.15.1 Found existing installation: tensorboard 1.15.0 Uninstalling tensorboard-1.15.0: Successfully uninstalled tensorboard-1.15.0 Successfully installed cachetools-4.0.0 chardet-3.0.4 google-auth-1.10.0 google-auth-oauthlib-0.4.1 idna-2.8 oauthlib-3.1.0 pyasn1-0.4.8 pyasn1-modules-0.2.7 requests-2.22.0 requests-oauthlib-1.3.0 rsa-4.0 tensorboard-2.0.2 tensorflow-estimator-2.0.1 tensorflow-gpu-2.0.0 tensorflow-large-model-support-0.1.0 toposort-1.5 urllib3-1.25.7
after I installed tensorflow-large-model-support package, 'conda list tensorflow' shown that:
tensorboard 2.0.2 pypi_0 pypi tensorflow 1.15.0 gpu_py37h0f0df58_0 tensorflow-base 1.15.0 gpu_py37h9dcbed7_0 tensorflow-estimator 2.0.1 pypi_0 pypi tensorflow-gpu 2.0.0 pypi_0 pypi tensorflow-large-model-support 0.1.0 pypi_0 pypi
You can see that my conda installed tensorflow-gpu, tensorflow-estimator, and tensorboard were replaced by pip installed newer ones.
So I have to downgraded those packages to 1.15.0 again.
Apart from the overwrited package problem, the most important problem is that, even I downgraded those packages and run my codes with tflms, I found that bigger input size still lead to GPU memory exhausted just like I did not use tflms. (I use U-Net to train 3D lung CT with input size 192192112).
My model was running fine on GPU, but after installing this package as mentioned in the README , now my model started running on CPU only. When I checked the available device only CPU shows up. How to fix this issue?