Unity-Technologies / ml-agents

The Unity Machine Learning Agents Toolkit (ML-Agents) is an open-source project that enables games and simulations to serve as environments for training intelligent agents using deep reinforcement learning and imitation learning.
https://unity.com/products/machine-learning-agents
Other
16.93k stars 4.14k forks source link

tensorflow-gpu tutorial is basically useless [FIXED] #1600

Closed wilk3ns closed 5 years ago

wilk3ns commented 5 years ago

I've followed all Windows installation steps carefully, here at the end everything worked on CPU. Then I tried switching GPU by provided tutorial and never got it working. Basically problem is the method provided is not valid. You cannot uninstall tensorflow and install tensorflow-gpu in order to make training happening on GPU. Basically my tensorflow-gpu install worked perfectly, it recognized my GPU and etc. But mlagents-learn script is hardcoded to check if cpu version is installed. So you can never run it on tensorflow-gpu. Please provide us with new tutorial or fix current one. Maybe we should use different version of CUDA? Any help will be appreciated! Thank you.

taesiri commented 5 years ago

You can change tensorflow to tensorflow-gpu in this line and run pip install -e . afterwards.

Related Issue: #1534

wilk3ns commented 5 years ago

@taesiri Already tried, after that in just installs normal tensorflow, then training runs on cpu :(

taesiri commented 5 years ago

How about this pip install --upgrade --ignore-installed tensorflow-gpu==1.7 ?

I should also mention that the problem is not the ml-agents, but your python environment.

wilk3ns commented 5 years ago

Never tried but problem seems related to mlagents-learn itself. It checks if certain versions of helper libraries are installed. The ones are only used by tensorflow, not tensorflow-gpu. Even after installing all that libraries manually at the end it will ask for normal tensorflow to be installed.

taesiri commented 5 years ago

Could you please post some of the error/log messages you get?

zheyangshi commented 5 years ago

emmm, I could train the agent on GPU by following the tutorial. I think maybe you missed some steps, could you use #pip list# to show your current libraries?

wilk3ns commented 5 years ago

I've deleted and re-created environment then gave a fresh install. After installing tensorflow-gpu, it recognized my GPU:

`(ml-agents) D:\Unity Projects\ml-agents\ml-agents>python Python 3.6.8 |Anaconda, Inc.| (default, Dec 30 2018, 18:50:55) [MSC v.1915 64 bit (AMD64)] on win32 Type "help", "copyright", "credits" or "license" for more information.

import tensorflow as tf sess = tf.Session(config=tf.ConfigProto(log_device_placement=True)) 2019-01-16 09:57:06.503688: I T:\src\github\tensorflow\tensorflow\core\platform\cpu_feature_guard.cc:140] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 2019-01-16 09:57:06.815691: I T:\src\github\tensorflow\tensorflow\core\common_runtime\gpu\gpu_device.cc:1344] Found device 0 with properties: name: GeForce GTX 980M major: 5 minor: 2 memoryClockRate(GHz): 1.1265 pciBusID: 0000:01:00.0 totalMemory: 4.00GiB freeMemory: 3.31GiB 2019-01-16 09:57:06.828330: I T:\src\github\tensorflow\tensorflow\core\common_runtime\gpu\gpu_device.cc:1423] Adding visible gpu devices: 0 2019-01-16 09:57:07.288390: I T:\src\github\tensorflow\tensorflow\core\common_runtime\gpu\gpu_device.cc:911] Device interconnect StreamExecutor with strength 1 edge matrix: 2019-01-16 09:57:07.293857: I T:\src\github\tensorflow\tensorflow\core\common_runtime\gpu\gpu_device.cc:917] 0 2019-01-16 09:57:07.298034: I T:\src\github\tensorflow\tensorflow\core\common_runtime\gpu\gpu_device.cc:930] 0: N 2019-01-16 09:57:07.302139: I T:\src\github\tensorflow\tensorflow\core\common_runtime\gpu\gpu_device.cc:1041] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 3045 MB memory) -> physical GPU (device: 0, name: GeForce GTX 980M, pci bus id: 0000:01:00.0, compute capability: 5.2) Device mapping: /job:localhost/replica:0/task:0/device:GPU:0 -> device: 0, name: GeForce GTX 980M, pci bus id: 0000:01:00.0, compute capability: 5.2 2019-01-16 09:57:07.602921: I T:\src\github\tensorflow\tensorflow\core\common_runtime\direct_session.cc:297] Device mapping: /job:localhost/replica:0/task:0/device:GPU:0 -> device: 0, name: GeForce GTX 980M, pci bus id: 0000:01:00.0, compute capability: 5.2`

Then, I'm trying to run this code in order to start learning mlagents-learn config/trainer_config.yaml --run-id=runtest --train

Error is like that: Traceback (most recent call last): File "D:\ProgramData\Anaconda3\envs\ml-agents\Scripts\mlagents-learn-script.py", line 6, in <module> from pkg_resources import load_entry_point File "D:\ProgramData\Anaconda3\envs\ml-agents\Lib\site-packages\pkg_resources\__init__.py", line 3126, in <module> @_call_aside File "D:\ProgramData\Anaconda3\envs\ml-agents\Lib\site-packages\pkg_resources\__init__.py", line 3110, in _call_aside f(*args, **kwargs) File "D:\ProgramData\Anaconda3\envs\ml-agents\Lib\site-packages\pkg_resources\__init__.py", line 3139, in _initialize_master_working_set working_set = WorkingSet._build_master() File "D:\ProgramData\Anaconda3\envs\ml-agents\Lib\site-packages\pkg_resources\__init__.py", line 581, in _build_master ws.require(__requires__) File "D:\ProgramData\Anaconda3\envs\ml-agents\Lib\site-packages\pkg_resources\__init__.py", line 898, in require needed = self.resolve(parse_requirements(requirements)) File "D:\ProgramData\Anaconda3\envs\ml-agents\Lib\site-packages\pkg_resources\__init__.py", line 784, in resolve raise DistributionNotFound(req, requirers) pkg_resources.DistributionNotFound: The 'tensorflow<1.8,>=1.7' distribution was not found and is required by mlagents

And this is the output from pip list

`Package Version Location


absl-py 0.7.0 astor 0.7.1 atomicwrites 1.2.1 attrs 18.2.0 backcall 0.1.0 bleach 1.5.0 certifi 2018.11.29 colorama 0.4.1 cycler 0.10.0 decorator 4.3.0 defusedxml 0.5.0 docopt 0.6.2 entrypoints 0.3 gast 0.2.2 grpcio 1.11.1 html5lib 0.9999999 ipykernel 5.1.0 ipython 7.2.0 ipython-genutils 0.2.0 ipywidgets 7.4.2 jedi 0.13.2 Jinja2 2.10 jsonschema 2.6.0 jupyter 1.0.0 jupyter-client 5.2.4 jupyter-console 6.0.0 jupyter-core 4.4.0 kiwisolver 1.0.1 Markdown 3.0.1 MarkupSafe 1.1.0 matplotlib 3.0.2 mistune 0.8.4 mlagents 0.6.0 d:\unity projects\ml-agents\ml-agents more-itertools 5.0.0 nbconvert 5.4.0 nbformat 4.4.0 notebook 5.7.4 numpy 1.14.5 pandocfilters 1.4.2 parso 0.3.1 pickleshare 0.7.5 Pillow 5.4.1 pip 18.1 pluggy 0.8.1 prometheus-client 0.5.0 prompt-toolkit 2.0.7 protobuf 3.6.1 py 1.7.0 Pygments 2.3.1 pyparsing 2.3.1 pytest 3.10.1 python-dateutil 2.7.5 pywinpty 0.5.5 PyYAML 3.13 pyzmq 17.1.2 qtconsole 4.4.3 Send2Trash 1.5.0 setuptools 40.6.3 six 1.12.0 tensorboard 1.7.0 tensorflow-gpu 1.7.1 termcolor 1.1.0 terminado 0.8.1 testpath 0.4.2 tornado 5.1.1 traitlets 4.3.2 wcwidth 0.1.7 Werkzeug 0.14.1 wheel 0.32.3 widgetsnbextension 3.4.2 wincertstore 0.2`

zheyangshi commented 5 years ago

@wilk3ns Oh, I don't know why. But I think issue #1534 may be helpful, please check it.

wilk3ns commented 5 years ago

@zheyangshi Thank you very much! I changed the line and ran pip install -e . and it worked!

zheyangshi commented 5 years ago

@wilk3ns You are weclome~

eshvk commented 5 years ago

Thank you for the discussion. We are closing this issue due to inactivity. Feel free to reopen it if you’d like to continue to discussion though.

github-actions[bot] commented 3 years ago

This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.