google-deepmind / launchpad

Apache License 2.0
312 stars 36 forks source link

Error in writing tests for dm-launchpad-nightly==0.3.0.dev20210802 #14

Closed ethanluoyc closed 3 years ago

ethanluoyc commented 3 years ago

Hi, I was recently running my test cases and encountered the following error.

E absl.flags._exceptions.UnparsedFlagAccessError: Trying to access flag --lp_termination_notice_secs before flags were parsed.

I didn't have this issue with dm-launchpad-nightly==0.3.0.dev20210728 so I suspect something was changed in the last few days.

Here is a snippet of my test case. I can include the other relevant files if they are helpful for debugging.

#!/usr/bin/env python3
"""Integration test for the distributed agent."""

from typing import Optional

from absl.testing import absltest
import acme
from acme.testing import fakes
import launchpad as lp
import numpy as np

from magi.agents.impala import agent_distributed as impala_lib
from magi.agents.impala import config
from magi.agents.impala.agent_test import MyNetwork

def network_factory(spec):
    def forward_fn(x, s):
        model = MyNetwork(spec.num_values)
        return model(x, s)

    def initial_state_fn(batch_size: Optional[int] = None):
        model = MyNetwork(spec.num_values)
        return model.initial_state(batch_size)

    def unroll_fn(inputs, state, start_of_episode=None):
        model = MyNetwork(spec.num_values)
        return model.unroll(inputs, state, start_of_episode)

    return {
        "forward": forward_fn,
        "unroll": unroll_fn,
        "initial_state": initial_state_fn,
    }

class DistributedAgentTest(absltest.TestCase):
    """Simple integration/smoke test for the distributed agent."""

    def test_control_suite(self):
        """Tests that the agent can run on the control suite without crashing."""

        def environment_factory():
            environment = fakes.DiscreteEnvironment(
                num_actions=5,
                num_observations=10,
                obs_shape=(5, 10),
                obs_dtype=np.float32,
                episode_length=10,
            )
            return environment

        agent = impala_lib.DistributedIMPALA(
            environment_factory=lambda seed, test: environment_factory(),
            network_factory=network_factory,
            num_actors=2,
            config=config.IMPALAConfig(
                sequence_length=4, sequence_period=4, max_queue_size=1000
            ),
            max_actor_steps=None,
        )
        program = agent.build()

        (learner_node,) = program.groups["learner"]
        learner_node.disable_run()

        lp.launch(program, launch_type="test_mt")

        learner: acme.Learner = learner_node.create_handle().dereference()

        for _ in range(5):
            learner.step()

if __name__ == "__main__":
    import tensorflow as tf

    tf.config.experimental.set_visible_devices([], "GPU")
    absltest.main()

I am using pytest and pytest-xdist for running the test cases. I did a check and it appears that running without pytest works fine. I suspect that the issue is that without using absl.app.run() the flags are not parsed hence throwing the error, but this is not desirable as the user may still want to use launchpad without using absl. It would be great if the configuration can be achieved without having to use absl flags.

zhongwen commented 3 years ago

we met the exact same issue cc @mavenlin

KaleabTessera commented 3 years ago

I am having the same issue, this is just running launchpad as normal (not running through pytest). This notebook reproduces the error.

qstanczyk commented 3 years ago

I can't reproduce this problem. Is it still happening with the new release? Can you extract a stack trace from the UnparsedFlagAccessError? Thanks

ethanluoyc commented 3 years ago

I just reran our CI with The dependencies I have in the CI run is:

absl-py==0.12.0
appdirs==1.4.4
astroid==2.6.6
astunparse==1.6.3
attrs==21.2.0
backports.entry-points-selectable==1.1.0
black==21.6b0
bsuite==0.3.5
cached-property==1.5.2
cachetools==4.2.2
certifi==2021.5.30
cfgv==3.3.0
charset-normalizer==2.0.4
chex==0.0.8
click==8.0.1
cloudpickle==1.6.0
configparser==5.0.2
contextlib2==21.6.0
cycler==0.10.0
dataclasses==0.6
decorator==5.0.9
descartes==1.1.0
dill==0.3.4
distlib==0.3.2
dm-acme @ git+git://github.com/deepmind/acme.git@b7c1c4bca84031cdb115d38d30e86cded02f1fdd
dm-control==0.0.364896371
dm-env==1.5
dm-haiku==0.0.4
dm-launchpad-nightly==0.3.0.dev20210815
dm-reverb-nightly==0.4.0.dev20210815
dm-sonnet==2.0.0
dm-tree==0.1.6
docker-pycreds==0.4.0
execnet==1.9.0
filelock==3.0.12
flake8==3.9.2
flatbuffers==1.12
frozendict==2.0.6
future==0.18.2
gast==0.4.0
gitdb==4.0.7
GitPython==3.1.18
glfw==2.1.0
google-auth==1.34.0
google-auth-oauthlib==0.4.5
google-pasta==0.2.0
googleapis-common-protos==1.53.0
grpcio==1.39.0
gym==0.18.3
h5py==3.1.0
identify==2.2.13
idna==3.2
imageio==2.9.0
importlab==0.6.1
importlib-metadata==4.6.4
importlib-resources==5.2.2
iniconfig==1.1.1
isort==5.8.0
jax==0.2.19
jaxlib==0.1.70+cuda111
keras-nightly==2.7.0.dev2021081507
Keras-Preprocessing==1.1.2
kiwisolver==1.3.1
labmaze==1.0.5
lazy-object-proxy==1.6.0
libclang==11.1.0
lxml==4.6.3
-e git+https://github.com/ethanluoyc/magi@ac234ea5a8d2d1672b04ce39223978a687b49058#egg=magi
Markdown==3.3.4
matplotlib==3.4.3
mccabe==0.6.1
mizani==0.7.3
ml-collections==0.1.0
mock==4.0.3
mypy-extensions==0.4.3
networkx==2.6.2
ninja==1.10.2
nodeenv==1.6.0
numpy==1.19.5
oauthlib==3.1.1
opt-einsum==3.3.0
optax==0.0.9
packaging==21.0
palettable==3.3.0
pandas==1.3.2
pathspec==0.9.0
pathtools==0.1.2
patsy==0.5.1
Pillow==8.2.0
platformdirs==2.2.0
plotnine==0.8.0
pluggy==0.13.1
portpicker==1.4.0
pre-commit==2.12.1
promise==2.3
protobuf==3.17.3
psutil==5.8.0
py==1.10.0
pyasn1==0.4.8
pyasn1-modules==0.2.8
pycodestyle==2.7.0
pyflakes==2.3.1
pyglet==1.5.15
pylint==2.8.2
PyOpenGL==3.1.5
pyparsing==2.4.7
pytest==6.2.4
pytest-forked==1.3.0
pytest-xdist==2.3.0
python-dateutil==2.8.2
pytype==2021.5.14
pytz==2021.1
PyWavelets==1.1.1
PyYAML==5.4.1
regex==2021.8.3
requests==2.26.0
requests-oauthlib==1.3.0
rlax @ git+git://github.com/deepmind/rlax.git@33c3462ff9a5651404f24110af6e26298712d5dd
rsa==4.7.2
scikit-image==0.18.2
scipy==1.7.1
sentry-sdk==1.3.1
shortuuid==1.0.1
six==1.15.0
smmap==4.0.0
statsmodels==0.12.2
subprocess32==3.5.4
tabulate==0.8.9
tb-nightly==2.7.0a20210813
tensorboard-data-server==0.6.1
tensorboard-plugin-wit==1.8.0
tensorflow-datasets==4.4.0
tensorflow-io-gcs-filesystem==0.20.0
tensorflow-metadata==1.2.0
termcolor==1.1.0
tf-estimator-nightly==2.7.0.dev2021081508
tf-nightly==2.7.0.dev20210815
tfp-nightly==0.14.0.dev20210815
tifffile==2021.8.8
toml==0.10.2
toolz==0.11.1
tqdm==4.62.1
trfl==1.1.0
typed-ast==1.4.3
typing-extensions==3.7.4.3
urllib3==1.26.6
virtualenv==20.7.2
wandb==0.12.0
Werkzeug==2.0.1
wrapt==1.12.1
zipp==3.5.0

The interesting dependencies are

dm-launchpad-nightly==0.3.0.dev20210815
dm-reverb-nightly==0.4.0.dev20210815

The stack trace is

=================================== FAILURES ===================================
___________________ DistributedAgentTest.test_control_suite ____________________
[gw1] linux -- Python 3.7.11 /opt/hostedtoolcache/Python/3.7.11/x64/bin/python

self = <magi.agents.impala.agent_distributed_test.DistributedAgentTest testMethod=test_control_suite>

    def test_control_suite(self):
        """Tests that the agent can run on the control suite without crashing."""

        def environment_factory():
            environment = fakes.DiscreteEnvironment(
                num_actions=5,
                num_observations=10,
                obs_shape=(5, 10),
                obs_dtype=np.float32,
                episode_length=10,
            )
            return environment

        agent = impala_lib.DistributedIMPALA(
            environment_factory=lambda seed, test: environment_factory(),
            network_factory=network_factory,
            num_actors=2,
            config=config.IMPALAConfig(
                sequence_length=4, sequence_period=4, max_queue_size=1000
            ),
            max_actor_steps=None,
        )
        program = agent.build()

        (learner_node,) = program.groups["learner"]
        learner_node.disable_run()

>       lp.launch(program, launch_type="test_mt")

magi/agents/impala/agent_distributed_test.py:67: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
/opt/hostedtoolcache/Python/3.7.11/x64/lib/python3.7/site-packages/launchpad/launch/launch.py:88: in launch
    return launch_test_multithreaded.launch(program, test_case=test_case)
/opt/hostedtoolcache/Python/3.7.11/x64/lib/python3.7/site-packages/launchpad/launch/test_multi_threading/launch.py:49: in launch
    manager = worker_manager.WorkerManager()
/opt/hostedtoolcache/Python/3.7.11/x64/lib/python3.7/site-packages/launchpad/launch/worker_manager.py:86: in __init__
    self._termination_notice_secs = FLAGS.lp_termination_notice_secs
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

self = <absl.flags._flagvalues.FlagValues object at 0x7fc6f61f1f50>
name = 'lp_termination_notice_secs'

    def __getattr__(self, name):
      """Retrieves the 'value' attribute of the flag --name."""
      fl = self._flags()
      if name not in fl:
        raise AttributeError(name)
      if name in self.__dict__['__hiddenflags']:
        raise AttributeError(name)

      if self.__dict__['__flags_parsed'] or fl[name].present:
        return fl[name].value
      else:
        error_message = ('Trying to access flag --%s before flags were parsed.' %
                         name)
        if six.PY2:
          # In Python 2, hasattr returns False if getattr raises any exception.
          # That means if someone calls hasattr(FLAGS, 'flag'), it returns False
          # instead of raises UnparsedFlagAccessError even if --flag is already
          # defined. To make the error more visible, the best we can do is to
          # log an error message before raising the exception.
          # Don't log a full stacktrace here since that makes other callers
          # get too much noise.
          logging.error(error_message)
>       raise _exceptions.UnparsedFlagAccessError(error_message)
E       absl.flags._exceptions.UnparsedFlagAccessError: Trying to access flag --lp_termination_notice_secs before flags were parsed.

/opt/hostedtoolcache/Python/3.7.11/x64/lib/python3.7/site-packages/absl/flags/_flagvalues.py:498: UnparsedFlagAccessError
=============================== warnings summary ===============================
../../../../../opt/hostedtoolcache/Python/3.7.11/x64/lib/python3.7/site-packages/tensorflow/python/autograph/impl/api.py:22
../../../../../opt/hostedtoolcache/Python/3.7.11/x64/lib/python3.7/site-packages/tensorflow/python/autograph/impl/api.py:22
  /opt/hostedtoolcache/Python/3.7.11/x64/lib/python3.7/site-packages/tensorflow/python/autograph/impl/api.py:22: DeprecationWarning: the imp module is deprecated in favour of importlib; see the module's documentation for alternative uses
    import imp

-- Docs: https://docs.pytest.org/en/stable/warnings.html
=========================== short test summary info ============================
FAILED magi/agents/impala/agent_distributed_test.py::DistributedAgentTest::test_control_suite
============= 1 failed, 11 passed, 2 warnings in 85.30s (0:01:25) ==============
make: *** [Makefile:10: test] Error 1
qstanczyk commented 3 years ago

Parsing of flags should work just fine with absltest.main() (and it works for me). Does this work for you:

from absl.testing import absltest
from absl import flags
flags.DEFINE_string('x', "sss", "some explation")
FLAGS = flags.FLAGS

class DistributedAgentTest(absltest.TestCase):

    def test_control_suite(self):
      print(FLAGS.x) 

if __name__ == "__main__":
  import tensorflow as tf
  tf.config.experimental.set_visible_devices([], "GPU")
  absltest.main()

If so, then doing binary search between this example and your Launchpad program could point at the issue. If it doesn't - I think there is something wrong with the absl you use.

ethanluoyc commented 3 years ago

I can use the repro here to repro this behavior.

When running with This works fine when running with

python magi/dummy_test.py

but would fail when running

pytest magi/dummy_test.py

I think the issue is that I am using pytest to run all of the test cases and I am not using a conftest.py file to configure absl testing. This didn't happen with the previous version of LP. One reason I suspect is that the earlier versions do not define the lp_termination_notice_secs flag. In more recent versions, however, this functionality was added and this is used regardless of whether the user parses the flags before launching programs with LP.

I think as a feature request here, we are thinking if it would be possible for LP to function even when absl FLAGS parsing is not called by the user. For many non-Google users, they probably won't be using absl for command line parsing and supporting this use case would make the OSS users happier since they don't have to buy in absl if they want to use LP.

qstanczyk commented 3 years ago

Ok, will try to make a change for this.

qstanczyk commented 3 years ago

Can you check if this change helps?

KaleabTessera commented 3 years ago

@qstanczyk That looks promising. I still get the error (UnparsedFlagAccessError: Trying to access flag --lp_termination_notice_secs before flags were parsed.) in 0.3.0.dev20210817, but I assume your change will be in 0.3.0.dev20210818?

This is the workaround I was doing to get LP working through a notebook/colab.

tf.compat.v1.flags.DEFINE_string('f','','')
try: 
     app.run(lambda argv: None)
except:
      pass

I believe your change does something similar.

qstanczyk commented 3 years ago

Yeah, the change just got in, so it will be in the next nightly (on when building from sources).

KaleabTessera commented 3 years ago

@qstanczyk This worked, thanks for the fix :open_hands:

qstanczyk commented 3 years ago

Good ;-)