Closed joaogui1 closed 2 years ago
Hi Joao,
I can look into this. I'm not familiar with dopamine myself. Is this the library you are referring to? https://github.com/google/dopamine I can see if I can submit the cartpole example.
In terms of urgency, are you exploring a variety of frameworks, or is this the framework you will primarily be working with?
This is the framework we'll be working with, yes Can you try MountainCar and LunarLander? I think there are some special installs for one of them
Are you intending to modify dopamine or just use it as a library?
Try this as a starting point:
spec = xm.PythonContainer(
base_image='gcr.io/deeplearning-platform-release/base-cpu',
docker_instructions=[
'RUN apt update && apt install -y python3-opencv',
'RUN pip install dopamine-rl',
'RUN mkdir workdir',
f'RUN wget -O workdir/{gin_file} https://raw.githubusercontent.com/google/dopamine/master/{FLAGS.gin_file}',
'WORKDIR workdir',
],
entrypoint=xm.ModuleName('dopamine.discrete_domains.train'),
)
I put together an example that uploads tf events to Vertex Tensorboard:
https://github.com/deepmind/xmanager/blob/main/examples/dopamine/launcher.py
Thanks for all the help @andrewluchen, do you think we could chat Thursday or Friday? I still have some doubts about using xmanager and I think showing them to you would be faster
Hey @andrewluchen I got the following error when building the docker image
Dockerfile:
FROM gcr.io/deeplearning-platform-release/base-cu110
RUN apt update && apt install -y python3-opencv
RUN pip install dopamine-rl
COPY . workdir
WORKDIR workdir
COPY entrypoint.sh ./entrypoint.sh
RUN chmod +x ./entrypoint.sh
ENTRYPOINT ["./entrypoint.sh", "--env=cartpole", "--agent=dqn"]
...
=> [internal] load build definition from Dockerfile 0.3s
=> => transferring dockerfile: 381B 0.0s
=> [internal] load .dockerignore 0.3s
=> => transferring context: 2B 0.0s
=> [internal] load metadata for gcr.io/deeplearning-platform-release/base-cu110:latest 0.0s
=> [1/8] FROM gcr.io/deeplearning-platform-release/base-cu110 0.9s
=> [internal] load build context 0.2s
=> => transferring context: 510.23kB 0.0s
=> [2/8] RUN apt update && apt install -y python3-opencv 66.7s
=> [3/8] RUN pip install dopamine-rl 64.6s
=> [4/8] COPY . workdir 0.2s
=> [5/8] WORKDIR workdir 0.2s
=> [6/8] COPY entrypoint.sh ./entrypoint.sh 0.2s
=> [7/8] RUN chmod +x ./entrypoint.sh 0.4s
=> ERROR [8/8] RUN chmod +x ./wrapped_entrypoint.sh
But I don't know where are we mentioning an entrypoint.sh
Any ideas how to fix it?
Could you pass --wrap_late_bindings=False
to your command?
If you are only launching one job per experiment, this won't be useful. This is a flag that is primarily used to support distributed multi-host training, as it enables us to share the address of jobs to each other, like this: https://github.com/deepmind/xmanager/blob/main/examples/cifar10_torch/launcher.py#L60
Has xmanager been updated? I got a FATAL Flags parsing error: Unknown command line flag 'wrap_late_bindings'
Also I sent you more sensitive details through email, thanks for all the help!
How are you launching? I cd
into examples/
and ran this which worked,
xmanager launch launcher.py -- --wrap_late_bindings=False
I also used the python cmd, which worked:
python3 launcher.py --wrap_late_bindings=False
Looks like I was missing the first -- (before --wrap_late_bindings), it's working now, thanks!
No my job erroed as follows:
Ok, this is just because it seems it didn't copy my main file to Google Cloud
How do I tell it to copy my files to GCP @andrewluchen? I thought I just needed to pass path="."
Any idea why xmanager isn't copying the whole directory @andrewluchen ?
path="."
should copy the entire directory that launcher.py
is in. For example, if you have some something like /home/user/project/launcher.py
and you run xmanager launcher.py
, you should expect the entire contents of /home/user/project/
to be copied into your image.
Is that not the behavior that you observe? What is your directory structure like and what is your launcher script look like?
Could you also email me the URL of the image so I can check what was copied?
.
├── agents
│ ├── dqn_agent_new.py
│ ├── external_configurations.py
│ ├── implicit_quantile_agent_new.py
│ ├── networks_new.py
│ ├── quantile_agent_new.py
│ └── rainbow_agent_new.py
├── Configs
│ ├── dqn_acrobot.gin
│ ...
├── example.py
├── full_replay.py
├── __init__.py
├── launcher.py
├── main_offline_experiments.py
├── minatar_env.py
├── networks_new.py
├── off-lineish.py
├── offrunner.py
├── __pycache__
│ ├── launcher.cpython-38.pyc
│ └── launcher.cpython-39.pyc
├── replay_runner.py
├── seedmain_code_experiments.py
└── xtests.py
(xtests was previously xmanager_exp, I tried changing the name to avoid underscores but it didn't help) Will email you the URL
Closing old issues.
Hey @andrewluchen can you help us build an image that runs dopamine? None of us has experience with it :(