Closed wilk3ns closed 5 years ago
You can change tensorflow
to tensorflow-gpu
in this line and run pip install -e .
afterwards.
Related Issue: #1534
@taesiri Already tried, after that in just installs normal tensorflow, then training runs on cpu :(
How about this pip install --upgrade --ignore-installed tensorflow-gpu==1.7
?
I should also mention that the problem is not the ml-agents, but your python environment.
Never tried but problem seems related to mlagents-learn itself. It checks if certain versions of helper libraries are installed. The ones are only used by tensorflow, not tensorflow-gpu. Even after installing all that libraries manually at the end it will ask for normal tensorflow to be installed.
Could you please post some of the error/log messages you get?
emmm, I could train the agent on GPU by following the tutorial. I think maybe you missed some steps, could you use #pip list# to show your current libraries?
I've deleted and re-created environment then gave a fresh install. After installing tensorflow-gpu, it recognized my GPU:
`(ml-agents) D:\Unity Projects\ml-agents\ml-agents>python Python 3.6.8 |Anaconda, Inc.| (default, Dec 30 2018, 18:50:55) [MSC v.1915 64 bit (AMD64)] on win32 Type "help", "copyright", "credits" or "license" for more information.
import tensorflow as tf sess = tf.Session(config=tf.ConfigProto(log_device_placement=True)) 2019-01-16 09:57:06.503688: I T:\src\github\tensorflow\tensorflow\core\platform\cpu_feature_guard.cc:140] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 2019-01-16 09:57:06.815691: I T:\src\github\tensorflow\tensorflow\core\common_runtime\gpu\gpu_device.cc:1344] Found device 0 with properties: name: GeForce GTX 980M major: 5 minor: 2 memoryClockRate(GHz): 1.1265 pciBusID: 0000:01:00.0 totalMemory: 4.00GiB freeMemory: 3.31GiB 2019-01-16 09:57:06.828330: I T:\src\github\tensorflow\tensorflow\core\common_runtime\gpu\gpu_device.cc:1423] Adding visible gpu devices: 0 2019-01-16 09:57:07.288390: I T:\src\github\tensorflow\tensorflow\core\common_runtime\gpu\gpu_device.cc:911] Device interconnect StreamExecutor with strength 1 edge matrix: 2019-01-16 09:57:07.293857: I T:\src\github\tensorflow\tensorflow\core\common_runtime\gpu\gpu_device.cc:917] 0 2019-01-16 09:57:07.298034: I T:\src\github\tensorflow\tensorflow\core\common_runtime\gpu\gpu_device.cc:930] 0: N 2019-01-16 09:57:07.302139: I T:\src\github\tensorflow\tensorflow\core\common_runtime\gpu\gpu_device.cc:1041] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 3045 MB memory) -> physical GPU (device: 0, name: GeForce GTX 980M, pci bus id: 0000:01:00.0, compute capability: 5.2) Device mapping: /job:localhost/replica:0/task:0/device:GPU:0 -> device: 0, name: GeForce GTX 980M, pci bus id: 0000:01:00.0, compute capability: 5.2 2019-01-16 09:57:07.602921: I T:\src\github\tensorflow\tensorflow\core\common_runtime\direct_session.cc:297] Device mapping: /job:localhost/replica:0/task:0/device:GPU:0 -> device: 0, name: GeForce GTX 980M, pci bus id: 0000:01:00.0, compute capability: 5.2`
Then, I'm trying to run this code in order to start learning mlagents-learn config/trainer_config.yaml --run-id=runtest --train
Error is like that:
Traceback (most recent call last): File "D:\ProgramData\Anaconda3\envs\ml-agents\Scripts\mlagents-learn-script.py", line 6, in <module> from pkg_resources import load_entry_point File "D:\ProgramData\Anaconda3\envs\ml-agents\Lib\site-packages\pkg_resources\__init__.py", line 3126, in <module> @_call_aside File "D:\ProgramData\Anaconda3\envs\ml-agents\Lib\site-packages\pkg_resources\__init__.py", line 3110, in _call_aside f(*args, **kwargs) File "D:\ProgramData\Anaconda3\envs\ml-agents\Lib\site-packages\pkg_resources\__init__.py", line 3139, in _initialize_master_working_set working_set = WorkingSet._build_master() File "D:\ProgramData\Anaconda3\envs\ml-agents\Lib\site-packages\pkg_resources\__init__.py", line 581, in _build_master ws.require(__requires__) File "D:\ProgramData\Anaconda3\envs\ml-agents\Lib\site-packages\pkg_resources\__init__.py", line 898, in require needed = self.resolve(parse_requirements(requirements)) File "D:\ProgramData\Anaconda3\envs\ml-agents\Lib\site-packages\pkg_resources\__init__.py", line 784, in resolve raise DistributionNotFound(req, requirers) pkg_resources.DistributionNotFound: The 'tensorflow<1.8,>=1.7' distribution was not found and is required by mlagents
And this is the output from pip list
`Package Version Location
absl-py 0.7.0 astor 0.7.1 atomicwrites 1.2.1 attrs 18.2.0 backcall 0.1.0 bleach 1.5.0 certifi 2018.11.29 colorama 0.4.1 cycler 0.10.0 decorator 4.3.0 defusedxml 0.5.0 docopt 0.6.2 entrypoints 0.3 gast 0.2.2 grpcio 1.11.1 html5lib 0.9999999 ipykernel 5.1.0 ipython 7.2.0 ipython-genutils 0.2.0 ipywidgets 7.4.2 jedi 0.13.2 Jinja2 2.10 jsonschema 2.6.0 jupyter 1.0.0 jupyter-client 5.2.4 jupyter-console 6.0.0 jupyter-core 4.4.0 kiwisolver 1.0.1 Markdown 3.0.1 MarkupSafe 1.1.0 matplotlib 3.0.2 mistune 0.8.4 mlagents 0.6.0 d:\unity projects\ml-agents\ml-agents more-itertools 5.0.0 nbconvert 5.4.0 nbformat 4.4.0 notebook 5.7.4 numpy 1.14.5 pandocfilters 1.4.2 parso 0.3.1 pickleshare 0.7.5 Pillow 5.4.1 pip 18.1 pluggy 0.8.1 prometheus-client 0.5.0 prompt-toolkit 2.0.7 protobuf 3.6.1 py 1.7.0 Pygments 2.3.1 pyparsing 2.3.1 pytest 3.10.1 python-dateutil 2.7.5 pywinpty 0.5.5 PyYAML 3.13 pyzmq 17.1.2 qtconsole 4.4.3 Send2Trash 1.5.0 setuptools 40.6.3 six 1.12.0 tensorboard 1.7.0 tensorflow-gpu 1.7.1 termcolor 1.1.0 terminado 0.8.1 testpath 0.4.2 tornado 5.1.1 traitlets 4.3.2 wcwidth 0.1.7 Werkzeug 0.14.1 wheel 0.32.3 widgetsnbextension 3.4.2 wincertstore 0.2`
@wilk3ns Oh, I don't know why. But I think issue #1534 may be helpful, please check it.
@zheyangshi Thank you very much! I changed the line and ran pip install -e .
and it worked!
@wilk3ns You are weclome~
Thank you for the discussion. We are closing this issue due to inactivity. Feel free to reopen it if you’d like to continue to discussion though.
This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.
I've followed all Windows installation steps carefully, here at the end everything worked on CPU. Then I tried switching GPU by provided tutorial and never got it working. Basically problem is the method provided is not valid. You cannot uninstall tensorflow and install tensorflow-gpu in order to make training happening on GPU. Basically my tensorflow-gpu install worked perfectly, it recognized my GPU and etc. But mlagents-learn script is hardcoded to check if cpu version is installed. So you can never run it on tensorflow-gpu. Please provide us with new tutorial or fix current one. Maybe we should use different version of CUDA? Any help will be appreciated! Thank you.