ERROR: test_resume_functionality (ppo.ppo_rllib_test.TestPPORllib)

sophiagu commented 1 year ago

Hi,

I cloned this repo by following the instructions in README (didn't change anything) and I got the following error message when running python -m unittest discover -s testing/ -p "*_test.py".

It seems to point to a FileNotFoundError, but it mentions a very detailed timestamp in the filename so I suspect the test itself is supposed to create that file, but it's not working correctly? Was that just a bug of the test itself? Or anything I can do to fix that?

======================================================================
ERROR: test_resume_functionality (ppo.ppo_rllib_test.TestPPORllib)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/home/sophiag/overcooked_ai/src/human_aware_rl/ppo/ppo_rllib_test.py", line 353, in test_resume_functionality
    options={"--loglevel": "ERROR"},
  File "/home/sophiag/anaconda3/envs/overcooked_ai/lib/python3.7/site-packages/sacred/experiment.py", line 276, in run
    run()
  File "/home/sophiag/anaconda3/envs/overcooked_ai/lib/python3.7/site-packages/sacred/run.py", line 238, in __call__
    self.result = self.main_function(*args)
  File "/home/sophiag/anaconda3/envs/overcooked_ai/lib/python3.7/site-packages/sacred/config/captured_function.py", line 42, in captured_function
    result = wrapped(*args, **kwargs)
  File "/home/sophiag/overcooked_ai/src/human_aware_rl/ppo/ppo_rllib_from_params_client.py", line 470, in main
    result = run(params)
  File "/home/sophiag/overcooked_ai/src/human_aware_rl/ppo/ppo_rllib_from_params_client.py", line 407, in run
    trainer = load_trainer(save_path=saved_path, true_num_workers=False)
  File "/home/sophiag/overcooked_ai/src/human_aware_rl/rllib/rllib.py", line 856, in load_trainer
    trainer = gen_trainer_from_params(config)
  File "/home/sophiag/overcooked_ai/src/human_aware_rl/rllib/rllib.py", line 801, in gen_trainer_from_params
    logger_creator=custom_logger_creator,
  File "/home/sophiag/anaconda3/envs/overcooked_ai/lib/python3.7/site-packages/ray/rllib/algorithms/algorithm.py", line 308, in __init__
    super().__init__(config=config, logger_creator=logger_creator, **kwargs)
  File "/home/sophiag/anaconda3/envs/overcooked_ai/lib/python3.7/site-packages/ray/tune/trainable/trainable.py", line 132, in __init__
    self._create_logger(self.config, logger_creator)
  File "/home/sophiag/anaconda3/envs/overcooked_ai/lib/python3.7/site-packages/ray/tune/trainable/trainable.py", line 823, in _create_logger
    self._result_logger = logger_creator(config)
  File "/home/sophiag/overcooked_ai/src/human_aware_rl/rllib/rllib.py", line 747, in custom_logger_creator
    logdir = tempfile.mkdtemp(prefix=logdir_prefix, dir=results_dir)
  File "/home/sophiag/anaconda3/envs/overcooked_ai/lib/python3.7/tempfile.py", line 366, in mkdtemp
    _os.mkdir(file, 0o700)
FileNotFoundError: [Errno 2] No such file or directory: '/home/sophiag/ray_results/PPO_cramped_room_False_nw=16_vf=0.000100_es=0.200000_en=0.000500_kl=0.200000_11_2023-04-13_19-10-53t8mqlh9x'

----------------------------------------------------------------------
Ran 7 tests in 597.026s

FAILED (errors=1)

micahcarroll commented 1 year ago

Thanks for reaching out – we'll be on this shortly!

jyan1999 commented 1 year ago

Hi, thanks for reaching out! I have been trying to reproduce this issue locally, and I just need some more information from you.

1): Did you get this error when running python -m unittest discover -s testing/ -p "*_test.py"? This test shouldn't be included in this test suite, which checks the Overcooked environment. This is a test under the human_aware_rl module that tests the training process. This test should also load a stored checkpoint, so it shouldn't be loading a checkpoint file like that.

2): If the issue persists, can you try running this command python -m unittest human_aware_rl.ppo.ppo_rllib_test.TestPPORllib.test_resume_functionality, which should run this test directly. Let me know if the issue persists.

sophiagu commented 1 year ago

1): yes. I created a clean, new environment, and followed the instructions step-by-step from here. That was the error I got when I proceeded to the final check step:

you should run the full testing suite that verifies all of the Overcooked accessory tools

That filename sounds like a weight and bias checkpoint file, I'm not familiar with that (maybe I need to do something to set it up first)?

2): tried that I got the same error

======================================================================
ERROR: test_resume_functionality (human_aware_rl.ppo.ppo_rllib_test.TestPPORllib)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/home/sophiag/overcooked_ai/src/human_aware_rl/ppo/ppo_rllib_test.py", line 353, in test_resume_functionality
    options={"--loglevel": "ERROR"},
  File "/home/sophiag/anaconda3/envs/overcooked_ai/lib/python3.7/site-packages/sacred/experiment.py", line 276, in run
    run()
  File "/home/sophiag/anaconda3/envs/overcooked_ai/lib/python3.7/site-packages/sacred/run.py", line 238, in __call__
    self.result = self.main_function(*args)
  File "/home/sophiag/anaconda3/envs/overcooked_ai/lib/python3.7/site-packages/sacred/config/captured_function.py", line 42, in captured_function
    result = wrapped(*args, **kwargs)
  File "/home/sophiag/overcooked_ai/src/human_aware_rl/ppo/ppo_rllib_from_params_client.py", line 470, in main
    result = run(params)
  File "/home/sophiag/overcooked_ai/src/human_aware_rl/ppo/ppo_rllib_from_params_client.py", line 407, in run
    trainer = load_trainer(save_path=saved_path, true_num_workers=False)
  File "/home/sophiag/overcooked_ai/src/human_aware_rl/rllib/rllib.py", line 856, in load_trainer
    trainer = gen_trainer_from_params(config)
  File "/home/sophiag/overcooked_ai/src/human_aware_rl/rllib/rllib.py", line 801, in gen_trainer_from_params
    logger_creator=custom_logger_creator,
  File "/home/sophiag/anaconda3/envs/overcooked_ai/lib/python3.7/site-packages/ray/rllib/algorithms/algorithm.py", line 308, in __init__
    super().__init__(config=config, logger_creator=logger_creator, **kwargs)
  File "/home/sophiag/anaconda3/envs/overcooked_ai/lib/python3.7/site-packages/ray/tune/trainable/trainable.py", line 132, in __init__
    self._create_logger(self.config, logger_creator)
  File "/home/sophiag/anaconda3/envs/overcooked_ai/lib/python3.7/site-packages/ray/tune/trainable/trainable.py", line 823, in _create_logger
    self._result_logger = logger_creator(config)
  File "/home/sophiag/overcooked_ai/src/human_aware_rl/rllib/rllib.py", line 747, in custom_logger_creator
    logdir = tempfile.mkdtemp(prefix=logdir_prefix, dir=results_dir)
  File "/home/sophiag/anaconda3/envs/overcooked_ai/lib/python3.7/tempfile.py", line 366, in mkdtemp
    _os.mkdir(file, 0o700)
FileNotFoundError: [Errno 2] No such file or directory: '/home/sophiag/ray_results/PPO_cramped_room_False_nw=16_vf=0.000100_es=0.200000_en=0.000500_kl=0.200000_11_2023-04-14_03-42-16tna29s45'

----------------------------------------------------------------------
Ran 1 test in 7.865s

FAILED (errors=1)

jyan1999 commented 1 year ago

Thanks for your info. You shouldn't need to set up anything for the test files to run. The file does refer to a logging directory that will contain the checkpoint file along with several other things, but it should be automatically created when the experiment is run.

I think I know where this error might be coming from. While I am still not sure how it is triggered, there could be a quick fix. The source of error should come from this line as a result of the failure to create a temporary directory.

You can manually create an empty ray_results directory at '/home/sophiag/ray_results/, which should solve this problem. You normally shouldn't need to manually create this directory as Ray should automatically create it when needed.

I am still not sure how it could be triggered through the python -m unittest discover -s testing/ -p "*_test.py" command. Let me know if you run it any other issues.

sophiagu commented 1 year ago

Sounds good! ty for looking into this

HumanCompatibleAI / overcooked_ai

ERROR: test_resume_functionality (ppo.ppo_rllib_test.TestPPORllib) #122