google-deepmind / open_spiel

OpenSpiel is a collection of environments and algorithms for research in general reinforcement learning and search/planning in games.
Apache License 2.0
4.16k stars 917 forks source link

Dockerized Openspiel #156

Closed yarncraft closed 3 years ago

yarncraft commented 4 years ago

I've been working on putting OpenSpiel in Docker container (since it always greatly boosts framework adoption). So far I can get the tests to run, but there are still some tests prone to failure. Is there anyone else with some Docker experience that spots the problem in the Dockerfile.

Dockerfile Reference: https://github.com/yarncraft/DockerizedOpenSpiel

Thanks in advance!

Tests
lanctot commented 4 years ago

Hi @yarncraft , this is really cool!

Looks like all the python tests are failing, so it's probably the case that the Python API (pybind11) is not building properly.

We've seen this a few times. It's usually been one of these:

I've never used Docker. Can you tell me what OS you are using and its version?

lanctot commented 4 years ago

I've never used Docker. Can you tell me what OS you are using and its version?

Haha, clicked on literally the only link in your message and saw Ubuntu 18.04. Perfect. It's been fairly straight-forward to install OpenSpiel on Ubuntu 18.04. My guess is that you don't have python3-dev installed. Did you run install.sh to get all the dependencies?

lanctot commented 4 years ago

LOL I just saw now the second line of your Dockerfile installs python3-dev. Looking at the file, seems fine.

Hmm... do you know if you can import pyspiel from python3 in the Docker instance? Can you try the manual build and instead of running tests, do:

python3
Python 3.7.6 (default, Feb  4 2020, 17:04:58) 
[Clang 11.0.0 (clang-1100.0.33.16)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import pyspiel
>>>

Does this cause an error?

Before doing this , you'll need to set the PYTHONPATH environment variables, see Step 4 of https://github.com/deepmind/open_spiel/blob/master/docs/install.md

yarncraft commented 4 years ago

According to step 4 in the docs I read the following:

_To be able to import the Python code (both the C++ binding pyspiel and the rest) from any location, you will need to add to your PYTHONPATH the root directory and the openspiel directory.

I think this is not needed when you put the framework in the container as you would use Docker containers exactly to avoid undertaking such steps. (Since you would just run your python scripts in the containerized environment instead).

However there is a note in step 2 stating the following

Install pip deps as your user. Do not use the system's pip.

curl https://bootstrap.pypa.io/get-pip.py -o get-pip.py
python3 get-pip.py --user
pip3 install --upgrade pip --user
pip3 install --upgrade setuptools testresources --user

So maybe I can try to install pip in this way and check if this resolves the problem. Anyway it seems quite unlikely :)

lanctot commented 4 years ago

I think this is not needed when you put the framework in the container as you would use Docker containers exactly to avoid undertaking such steps. (Since you would just run your python scripts in the containerized environment instead).

Right, but currently the python tests are failing within the environment and when you're using build_and_run_tests.sh which sets up the Python paths for you as necessary. I was asking to try doing the import pyspiel from within the container to see if it was built properly. If the tests are failing, something is going wrong in the container, so we have to diagnose it. I mainly want to know if it's how we're running the Python tests from CMake or if you can't load pyspiel at all (from within the container).

I've never used Docker so I have no idea how to debug these things.

However there is a note in step 2 stating the following

Install pip deps as your user. Do not use the system's pip.

curl https://bootstrap.pypa.io/get-pip.py -o get-pip.py
python3 get-pip.py --user
pip3 install --upgrade pip --user
pip3 install --upgrade setuptools testresources --user

This is mostly a recommendation so that users don't mess up their system pip configuration. I doubt this will be it, but one more thing to try.

yarncraft commented 4 years ago

Ok I'll make sure to check out if I can run a python script whereby I import pyspiel in my containerized environment!

Note that not all tests fail (in fact 57% passes) however the tests that do fail seem to be quite random, therefore it is indeed hard to debug what exactly goes wrong. I'll do a full report on what could be the cause somewhere in the next 24 hours.

lanctot commented 4 years ago

Note that not all tests fail (in fact 57% passes) however the tests that do fail seem to be quite random, therefore it is indeed hard to debug what exactly goes wrong. I'll do a full report on what could be the cause somewhere in the next 24 hours.

Right, looks like the C++ tests are passing, so OpenSpiel is building and running properly. From your screenshot it seems to be only the python tests, and we've seen this before so I think it's just that the Python API is not getting build or linked properly (or the CMake tests are not building the python tests properly).

elkhrt commented 4 years ago

Thanks for taking the initiative to do this! I needed something similar a few months ago. I am no docker expert, so this is certainly not optimal, but the following worked for me. This was part of a larger effort; I've removed some things I know to be irrelevant, but other unneeded items may still be there.

FROM ubuntu:20.04 RUN apt update RUN dpkg --add-architecture i386 && apt update RUN apt-get -y install \ clang \ curl \ cmake \ git \ python3 \ python3-dev \ python3-pip \ python3-setuptools \ python3-wheel \ sudo RUN git clone -b 'master' --single-branch --depth 15 https://github.com/deepmind/open_spiel.git open_spiel WORKDIR open_spiel RUN ./install.sh RUN mkdir -p build && \ cd build && \ cmake -DPython_TARGET_VERSION=${PYVERSION} -DCMAKE_CXX_COMPILER=which clang++ ../open_spiel && \ make -j4 RUN pip3 install absl-py scipy COPY . build CMD /open_spiel/build/run.sh

yarncraft commented 4 years ago

@elkhrt Thanks for sharing, to make your Dockerfile future proof you might consider adding a python upgrade step. I do observe that you don't make use of a virtualenv, this might not be needed when you containerize indeed.

I will give this Dockerfile a try in a minute!

yarncraft commented 4 years ago

@elkhrt It seems like the Dockerfile is working correctly for the TicTacToe example. It does not run tests explicitly however, does it check these internally when executing the last CMD or is it something that needs to be added as well?

elkhrt commented 4 years ago

Yours is far better / future proof / general. And actually runs the tests, unlike mine. I was just sharing it in case it helped you find what was missing in yours.

On Tue, 3 Mar 2020, 10:27 Lucas Engels, notifications@github.com wrote:

@elkhrt https://github.com/elkhrt It seems like the Dockerfile is working correctly for the TicTacToe example. It does not run tests explicitly however, does it check these internally when executing the last CMD or is it something that needs to be added as well?

β€” You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/deepmind/open_spiel/issues/156?email_source=notifications&email_token=AHAF7TG66Z55FYWBF6TMTLDRFTLPNA5CNFSM4K7X3KOKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOENS5RWY#issuecomment-593877211, or unsubscribe https://github.com/notifications/unsubscribe-auth/AHAF7TE63SV33LTKELG4EE3RFTLPNANCNFSM4K7X3KOA .

yarncraft commented 4 years ago

Ok I will make sure to enhance the Dockerfile with some extra obligatory steps. I think we can close this issue, I will come back with an update once I get everything up and running with the enhanced version!

Thanks @lanctot @elkhrt for the help!

yarncraft commented 4 years ago
56% tests passed, 58 tests failed out of 133

Total Test time (real) = 269.80 sec

The following tests FAILED:
     66 - python_api_test (Failed)
     67 - python_playthrough_test (Failed)
     68 - python_action_value_test (Failed)
     69 - python_action_value_vs_best_response_test (Failed)
     70 - python_best_response_test (Failed)
     71 - python_cfr_br_test (Failed)
     72 - python_cfr_test (Failed)
     73 - python_deep_cfr_test (Failed)
     74 - python_discounted_cfr_test (Failed)
     75 - python_dqn_test (Failed)
     76 - python_eva_test (Failed)
     77 - python_evaluate_bots_test (Failed)
     78 - python_expected_game_score_test (Failed)
     79 - python_exploitability_descent_test (Failed)
     80 - python_exploitability_test (Failed)
     81 - python_fictitious_play_test (Failed)
     82 - python_generate_playthrough_test (Failed)
     83 - python_get_all_states_test (Failed)
     84 - python_rl_losses_test (Failed)
     85 - python_lp_solver_test (Failed)
     86 - python_masked_softmax_test (Failed)
     87 - python_mcts_test (Failed)
     88 - python_minimax_test (Failed)
     89 - python_neurd_test (Failed)
     90 - python_nfsp_test (Failed)
     91 - python_outcome_sampling_mccfr_test (Failed)
     92 - python_policy_aggregator_joint_test (Failed)
     93 - python_policy_aggregator_test (Failed)
     94 - python_policy_gradient_test (Failed)
     95 - python_projected_replicator_dynamics_test (Failed)
     96 - python_generalized_psro_test (Failed)
     97 - python_rectified_nash_response_test (Failed)
     98 - python_random_agent_test (Failed)
     99 - python_rcfr_test (Failed)
    100 - python_sequence_form_lp_test (Failed)
    101 - python_value_iteration_test (Failed)
    102 - python_bluechip_bridge_uncontested_bidding_test (Failed)
    103 - python_uniform_random_test (Failed)
    104 - python_alpharank_test (Failed)
    105 - python_alpharank_visualizer_test (Failed)
    106 - python_dynamics_test (Failed)
    107 - python_heuristic_payoff_table_test (Failed)
    108 - python_utils_test (Failed)
    109 - python_visualization_test (Failed)
    110 - python_catch_test (Failed)
    111 - python_cliff_walking_test (Failed)
    112 - python_data_test (Failed)
    113 - python_tic_tac_toe_test (Failed)
    114 - python_bot_test (Failed)
    115 - python_games_sim_test (Failed)
    116 - python_matrix_game_utils_test (Failed)
    117 - python_policy_test (Failed)
    118 - python_pyspiel_test (Failed)
    119 - python_rl_environment_test (Failed)
    120 - python_tensor_game_utils_test (Failed)
    121 - python_file_logger_test (Failed)
    122 - python_lru_cache_test (Failed)
    123 - python_examples_bridge_supervised_learning (Failed)
Errors while running CTest
The command '/bin/sh -c mkdir -p build &&     cd build &&     cmake -DPython_TARGET_VERSION=${PYVERSION} -DCMAKE_CXX_COMPILER=`which clang++` ../open_spiel &&     make -j4 &&     ctest -j4'

@elkhrt, I added tests by running the ctest -j4 command after the build as explained in the installation docs. It seems like your Dockerfile experiences the same troubles as well! So the python build fails both through pip as through a manual compilation it seems.

yarncraft commented 4 years ago

So the dockerfile I'm using now is

FROM ubuntu:20.04
RUN apt update
RUN dpkg --add-architecture i386 && apt update
RUN apt-get -y install \
    clang \
    curl \
    cmake \
    git \
    python3 \
    python3-dev \
    python3-pip \
    python3-setuptools \
    python3-wheel \
    sudo

# clone repository and install
RUN git clone -b 'master' --single-branch --depth 15 https://github.com/deepmind/open_spiel.git open_spiel
WORKDIR open_spiel
RUN ./install.sh

# build and test
RUN mkdir -p build && \
    cd build && \
    cmake -DPython_TARGET_VERSION=${PYVERSION} -DCMAKE_CXX_COMPILER=`which clang++` ../open_spiel && \
    make -j4 && \
    ctest -j4
COPY . build

WORKDIR /open_spiel/build
CMD run.sh

When skipping the test command, you can indeed run the examples. However, as explained above, this Dockerfile approach also experiences the same troubles as did mine when it comes down to the Python tests.

lanctot commented 4 years ago

PYVERSION is something our script defines here: https://github.com/deepmind/open_spiel/blob/b19852be38e65de2db20dc9be6659e522a72e83d/open_spiel/scripts/build_and_run_tests.sh#L89, it's not defined by default.

So you either need to run the same command (in addition to the one that defines PYBIN) before running the test or hard-code the version number. (Ubuntu 18.04 comes with Python 3.6 but I see you do an upgrade so you might have 3.7-- you can find out which version you have using dpkg --list | grep python or from within the Python interpreter using import sys; print(sys.version))

Edit: noticed you moved to Ubuntu 20.04 which has Python version 3.8.

lanctot commented 4 years ago

Also looks like your new file is missing the installing the required pip packages..?

lanctot commented 4 years ago

If it helps, my go-to minimal manual install is the one on page 6 of the paper: https://arxiv.org/abs/1908.09453

yarncraft commented 4 years ago

I am now working with the Dockerfile provided by Lockhart, the Dockerfile uses: Python 3.8.2 pip 18.1

lanctot commented 4 years ago

I am now working with the Dockerfile provided by Lockhart, the Dockerfile uses: Python 3.8.2 pip 18.1

Cool. But you still need to install OpenSpiel's python dependencies via pip3 install --upgrade -r requirements.txt

yarncraft commented 4 years ago
FROM ubuntu:20.04
RUN apt update
RUN dpkg --add-architecture i386 && apt update
RUN apt-get -y install \
    clang \
    curl \
    cmake \
    git \
    python3 \
    python3-dev \
    python3-pip \
    python3-setuptools \
    python3-wheel \
    sudo

# clone repository and install
RUN git clone -b 'master' --single-branch --depth 15 https://github.com/deepmind/open_spiel.git open_spiel
WORKDIR open_spiel
RUN ./install.sh

RUN pip3 install --upgrade pip
RUN pip3 install --upgrade setuptools testresources 
RUN pip3 install --upgrade -r requirements.txt

# build and test
RUN mkdir -p build && \
    cd build && \
    cmake -DPython_TARGET_VERSION=${PYVERSION} -DCMAKE_CXX_COMPILER=`which clang++` ../open_spiel && \
    make -j4 && \
    ctest -j4
COPY . build

WORKDIR /open_spiel/build
CMD run.sh

results in

Step 10/14 : RUN pip3 install --upgrade -r requirements.txt
 ---> Running in 512b2a97f247
Requirement already satisfied: pip>=20.0.2 in /usr/local/lib/python3.8/dist-packages (from -r requirements.txt (line 2)) (20.0.2)
Collecting absl-py==0.9.0
  Downloading absl-py-0.9.0.tar.gz (104 kB)
ERROR: Could not find a version that satisfies the requirement tensorflow<2.0,>=1.15.1 (from -r requirements.txt (line 4)) (from versions: none)
ERROR: No matching distribution found for tensorflow<2.0,>=1.15.1 (from -r requirements.txt (line 4))
The command '/bin/sh -c sudo pip3 install --upgrade -r requirements.txt' returned a non-zero code: 1
lanctot commented 4 years ago

Ah man, not this again. This error was a pain to fix a few months back when we upgraded to TF 1.15. Maybe it is back to haunt us in Ubuntu 20.04.

Might not work by the system's pip. I wonder if the user-based pip is necessary now.

I will test an install independently on a Ubuntu 20.04 machine and report back. Might be a few days.

yarncraft commented 4 years ago
FROM ubuntu:18.04
RUN apt update
RUN dpkg --add-architecture i386 && apt update
RUN apt-get -y install \
    clang \
    curl \
    git \
    python3 \
    python3-dev \
    python3-pip \
    python3-setuptools \
    python3-wheel \
    sudo

RUN sudo pip3 install --upgrade pip
RUN sudo pip3 install matplotlib

# clone repository and install
RUN git clone -b 'master' --single-branch --depth 15 https://github.com/deepmind/open_spiel.git open_spiel
WORKDIR open_spiel
RUN ./install.sh

RUN pip3 install --upgrade setuptools testresources 
RUN pip3 install --upgrade -r requirements.txt
RUN pip3 install --upgrade cmake

# build and test
RUN mkdir -p build && \
    cd build && \
    cmake -DPython_TARGET_VERSION=${PYVERSION} -DCMAKE_CXX_COMPILER=`which clang++` ../open_spiel && \
    make -j4 && \
    ctest -j4
COPY . build

RUN python3 ./open_spiel/python/examples/matrix_game_example.py
WORKDIR /open_spiel/build
CMD run.sh

I managed to resolve the dependency issue by switching back to 18.04 and installing make through pip (since that downloads the latest version in contrary to apt-get). I am now running the tests again and I'm checking if I can import pyspiel in a Python script

lanctot commented 4 years ago

Ok I expect that last python3 command to fail (and a manual import pyspiel from the interpreter) unless you set the PYTHONPATH environment variables. Tests should be ok because they are set within the CMakeLists.txt IIRC.

yarncraft commented 4 years ago
100% tests passed, 0 tests failed out of 133

Total Test time (real) = 1053.32 sec

OK so the tests are ok, the script indeed still fails, so I'm adding the PYTHONPATH and everything should work properly in that case

inejc commented 4 years ago

I haven't spent too much time going through the conversation above but we've successfully dockerized OpenSpiel and hence I'm pasting the relevant docker files here in case they are of any help.

  1. We use this file to build a container which is used to build the OpenSpiel project (within CI), image size is ~800MB
  2. We use this file for building a container that actually runs the python scripts that import pyspiel, image size ~120MB

You can see how the container from 2. is built within a container from 1. in the circleci config. Two docker files are used to obtain a much smaller final container. Also, containers are based on debian:buster-slim so not sure whether this will be completely relevant for your problems; feel free to ignore in that case πŸ™‚. It's safe to ignore non-open-spiel related things within the project.

yarncraft commented 4 years ago

Ok thanks for sharing, I just got it working with the following setup:

FROM ubuntu:18.04
RUN apt update
RUN dpkg --add-architecture i386 && apt update
RUN apt-get -y install \
    clang \
    curl \
    git \
    python3 \
    python3-dev \
    python3-pip \
    python3-setuptools \
    python3-wheel \
    sudo

RUN sudo pip3 install --upgrade pip
RUN sudo pip3 install matplotlib

# clone repository and install
RUN git clone -b 'master' --single-branch --depth 15 https://github.com/deepmind/open_spiel.git open_spiel
WORKDIR open_spiel
RUN ./install.sh

# install Python dependencies
RUN pip3 install --upgrade setuptools testresources 
RUN pip3 install --upgrade -r requirements.txt
RUN pip3 install --upgrade cmake

# build and test
RUN mkdir -p build && \
    cd build && \
    cmake -DPython_TARGET_VERSION=${PYVERSION} -DCMAKE_CXX_COMPILER=`which clang++` ../open_spiel && \
    make -j4 && \
    ctest -j4
COPY . build

ENV PYTHONPATH=${PYTHONPATH}:/open_spiel/
ENV PYTHONPATH=${PYTHONPATH}:/open_spiel/build/python

WORKDIR /open_spiel/build
lanctot commented 4 years ago

Amazing! Thanks for working on this.

Please submit a PR. Many people would benefit from this, so we should have it somewhere!

yarncraft commented 4 years ago

Indeed! I am working on it! πŸ‘

yarncraft commented 4 years ago

PR submitted πŸ’―Being able to easily run Reinforcement Learning projects in the cloud will most certainly be beneficial!

yarncraft commented 4 years ago

@lanctot The PR still needs to be merged on the master, is there a problem still?

lanctot commented 4 years ago

No problem, I just got busy with work yesterday so couldn't do it yet.

We have a weekly update cycle: we import the PR internally and then it gets merged in our weekly push back to github (on Mondays).

So I will import it in the next few days and it will be merged on March 9th.

lanctot commented 4 years ago

I am wondering if you could try something for me though.

My server provider doesnt have Ubuntu 20.04 yet but I would like if we could fix the problem you faced before its release. I suspect if we bump the TF requirement to 2.0 I believe it will work (in requirements.txt).

Is it easy for you to just test out for me? Or share the entire Ubuntu 20.04 file too so I can try it?

yarncraft commented 4 years ago

Thanks for the update! πŸ‘ I will check if the system still works on Ubuntu 20.04 with TF 2.0 as soon as possible

lanctot commented 4 years ago

I tried internally and the upgrade to TF 2.0 breaks ~10 tests so if that's the solution we'll have to update some code first. When I can get an Ubuntu 20.04 machine then I'll look for a shorter-term solution (unless we've fixed those tests by then).

yarncraft commented 4 years ago

Ok as soon as you've updated the code I will try to run it in the Ubuntu 20.04. You should give Docker a try as well, it's very easy to install on whatever OS you're using! In that case you can just use the Dockerfile I provided and switch from 18.04 to 20.04. (if you can write shell scripts, you can write Dockerfiles)

yarncraft commented 4 years ago

@lanctot, I'm glad I could contribute, it seems like we're about to have a presentation from you for our Machine Learning project at the University of Leuven.

screenshot
lanctot commented 4 years ago

Indeed, I am looking forward to it. I was wondering if you were in that class. Is that how you found out about OpenSpiel?

yarncraft commented 4 years ago

Yes that is how I found out about OpenSpiel indeed, many people of the course struggled with getting it up and running on various operating systems so that's why I built a Dockerfile asap πŸ‘I already mentioned the new way of installation on our forum

Henry-E commented 3 years ago

Just an FYI. I tried installing openspiel using the docker container from the version 0.1.0 package. It came up with this error. It seems like this one failed test causes the docker container to not be created. Since it passes 99% of tests, I just deleted the testing part of the dockerfile and have successfully created the docker container.

Assuming that since docker is supposed to take care of dependencies and stuff (though it's my first time using it) that this is not an issue on my end (using the latest version of docker).

99% tests passed, 1 tests failed out of 151

Total Test time (real) = 1504.72 sec

The following tests FAILED:
    140 - python_examples_bridge_supervised_learning (Failed)
Errors while running CTest
The command '/bin/sh -c ctest -j12' returned a non-zero code: 8
lanctot commented 3 years ago

Thanks, we will look into it. Re-opening as a reminder. Tagging @elkhrt @yarncraft just so they know.

If all the other tests pass then I guess it will still work. That is currently the only use of Jax, it is possible the dependencies did not work out.

yarncraft commented 3 years ago

Just an FYI. I tried installing openspiel using the docker container from the version 0.1.0 package. It came up with this error. It seems like this one failed test causes the docker container to not be created. Since it passes 99% of tests, I just deleted the testing part of the dockerfile and have successfully created the docker container.

Assuming that since docker is supposed to take care of dependencies and stuff (though it's my first time using it) that this is not an issue on my end (using the latest version of docker).

99% tests passed, 1 tests failed out of 151

Total Test time (real) = 1504.72 sec

The following tests FAILED:
  140 - python_examples_bridge_supervised_learning (Failed)
Errors while running CTest
The command '/bin/sh -c ctest -j12' returned a non-zero code: 8

This is indeed normal behavior and not an error on your side. The testing part can be commented out for faster builds, most of the framework will behave correctly given the 99% passing grade. @lanctot, is the Jax dependency not part of the Openspiel installation? If this error is due to a missing dependency I can quickly fix it by adding it to the container spec.