google-deepmind / dqn_zoo

DQN Zoo is a collection of reference implementations of reinforcement learning agents developed at DeepMind based on the Deep Q-Network (DQN) agent.
Apache License 2.0
456 stars 78 forks source link

results.csv on disk is blank #8

Closed RylanSchaeffer closed 4 years ago

RylanSchaeffer commented 4 years ago

When you run an agent-env-seed combination, either using the Docker image or straight Python, no results are written to disk in the results.csv output file. I let the code run overnight on my cluster and nothing was flushed to the CSV. This was the output to stdout:

I1007 15:59:09.217102 47498856853312 run_atari.py:85] C51 on Atari on gpu.
I1007 15:59:10.231613 47498856853312 run_atari.py:111] Environment: pong
I1007 15:59:10.231938 47498856853312 run_atari.py:112] Action spec: DiscreteArray(shape=(), dtype=int32, name=action, minimum=0, maximum=5, num_values=6)
I1007 15:59:10.232475 47498856853312 run_atari.py:113] Observation spec: (Array(shape=(210, 160, 3), dtype=dtype('uint8'), name='rgb'), Array(shape=(), dtype=dtype('int32'), name='lives'))
I1007 15:59:18.271111 47498856853312 run_atari.py:220] Training iteration 0.
I1007 15:59:18.274485 47498856853312 run_atari.py:226] Evaluation iteration 0.
I1007 16:04:18.767457 47498856853312 run_atari.py:251] iteration:   0, frame:     0, eval_episode_return: -21.00, train_episode_return:  nan, eval_num_episodes: 164, train_num_episodes:   0, eval_frame_rate: 1664, train_frame_rate:  nan, train_exploration_epsilon: 1.000, normalized_return: -0.008, capped_normalized_return: -0.008, human_gap: 1.008
I1007 16:04:19.123273 47498856853312 run_atari.py:220] Training iteration 1.
I1007 16:06:28.051372 47498856853312 agent.py:163] Begin learning
I1007 16:19:37.069391 47498856853312 run_atari.py:226] Evaluation iteration 1.
I1007 16:24:43.913471 47498856853312 run_atari.py:251] iteration:   1, frame: 1000000, eval_episode_return: -21.00, train_episode_return: -20.14, eval_num_episodes: 164, train_num_episodes: 266, eval_frame_rate: 1630, train_frame_rate: 1089, train_exploration_epsilon: 0.802, normalized_return: -0.008, capped_normalized_return: -0.008, human_gap: 1.008
I1007 16:24:44.240913 47498856853312 run_atari.py:220] Training iteration 2.
I1007 16:41:06.477482 47498856853312 run_atari.py:226] Evaluation iteration 2.
I1007 16:46:11.618932 47498856853312 run_atari.py:251] iteration:   2, frame: 2000000, eval_episode_return: -20.99, train_episode_return: -20.27, eval_num_episodes: 164, train_num_episodes: 274, eval_frame_rate: 1639, train_frame_rate: 1018, train_exploration_epsilon: 0.555, normalized_return: -0.008, capped_normalized_return: -0.008, human_gap: 1.008
I1007 16:46:11.946672 47498856853312 run_atari.py:220] Training iteration 3.
I1007 17:02:34.446135 47498856853312 run_atari.py:226] Evaluation iteration 3.
I1007 17:07:38.262575 47498856853312 run_atari.py:251] iteration:   3, frame: 3000000, eval_episode_return: -21.00, train_episode_return: -20.43, eval_num_episodes: 164, train_num_episodes: 282, eval_frame_rate: 1646, train_frame_rate: 1018, train_exploration_epsilon: 0.307, normalized_return: -0.008, capped_normalized_return: -0.008, human_gap: 1.008
I1007 17:07:38.589899 47498856853312 run_atari.py:220] Training iteration 4.
I1007 17:24:00.612326 47498856853312 run_atari.py:226] Evaluation iteration 4.
I1007 17:29:06.500515 47498856853312 run_atari.py:251] iteration:   4, frame: 4000000, eval_episode_return: -21.00, train_episode_return: -20.68, eval_num_episodes: 164, train_num_episodes: 296, eval_frame_rate: 1635, train_frame_rate: 1018, train_exploration_epsilon: 0.060, normalized_return: -0.008, capped_normalized_return: -0.008, human_gap: 1.008
I1007 17:29:06.830286 47498856853312 run_atari.py:220] Training iteration 5.
I1007 17:45:27.496435 47498856853312 run_atari.py:226] Evaluation iteration 5.
I1007 17:50:33.692603 47498856853312 run_atari.py:251] iteration:   5, frame: 5000000, eval_episode_return: -20.45, train_episode_return: -20.86, eval_num_episodes:  94, train_num_episodes: 307, eval_frame_rate: 1633, train_frame_rate: 1020, train_exploration_epsilon: 0.010, normalized_return: 0.007, capped_normalized_return: 0.007, human_gap: 0.993
I1007 17:50:34.021842 47498856853312 run_atari.py:220] Training iteration 6.
I1007 18:06:57.161453 47498856853312 run_atari.py:226] Evaluation iteration 6.

and so on. But the results.csv file was created and remains empty.

RylanSchaeffer commented 4 years ago

I can reproduce this locally on CPU.

RylanSchaeffer commented 4 years ago

I found a short-term workaround by adding the following code to <agent>/run_atari.py and it works fine, which suggests to me the problem is not caused by a failure to flush some buffer.

        with open(FLAGS.results_csv_path, 'a') as fp:
            results_dict = collections.OrderedDict((n, v) for n, v, _ in log_output)
            if state.iteration == 0:
                fp.write(','.join(list(results_dict.keys())))
                fp.write('\n')
            fp.write(','.join(list(map(str, results_dict.values()))))
            fp.write('\n')
GeorgOstrovski commented 4 years ago

At first glance, I think it's because in the CsvWriter we open the file once and only close it at the very end (not after each logging step), so nothing gets written to disk until then. In your fix, you are re-opening the file whenever you want to append a row, and close it again (implicitly via the context manager).

This should be an easy fix, will update soon.

RylanSchaeffer commented 4 years ago

I didn't know that data is only flushed to the file once the file reference is closed. But yes, your summary of what I'm doing is correct and what you propose sounds good!

RylanSchaeffer commented 4 years ago

@GeorgOstrovski any updates?

GeorgOstrovski commented 4 years ago

This should be resolved by this commit. The CsvWriter is now nearly stateless, a file is being opened & closed for each write.

RylanSchaeffer commented 4 years ago

Wonderful. @GeorgOstrovski , I have a quick question - for researching distributional RL algorithms based on DQN and Atari environments, what are the comparative advantages of this codebase and dopamine?

GeorgOstrovski commented 4 years ago

I have to admit I haven't worked with dopamine's more recent JAX-based DQN implementations, so take with a grain of salt.

In the DQN Zoo we have put a particular emphasis on faithfully reproducing published algorithms (e.g. included all elements like NoisyNets in Rainbow, made sure to use all the same hyperparameter settings, etc) and strived for research friendliness via simplicity - all agent code explicit, contained in 1-2 files, easily modifiable.

These are in my view the biggest arguments in favour of the dqn_zoo, but not necessarily against dopamine (it's also easy to work with!) - I think ultimately it really comes down to personal preference.

RylanSchaeffer commented 4 years ago

My concern with both of these is how supported they'll be moving forward. Dopamine seems written entirely by one engineer, which is typically a warning. What about this project?

GeorgOstrovski commented 4 years ago

As stated in README.md / CONTRIBUTING, we consider this release a stable snapshot of historic DQN-based agents, and do not foresee any substantial changes beyond bug fixes. If by support you mean more than that - i.e. further development, adding new features, etc - we do not intend to evolve this code base in that sense. Continuing independent development on a fork is of course a possible avenue for those interested in a further evolution of this codebase.

RylanSchaeffer commented 4 years ago

What about adding more historic DQN-based agents? e.g. expectile regression, hyperbolic discounting, etc. (https://arxiv.org/abs/1902.06865)

I wouldn't view that as further development or adding new features, hence why I'm asking.

GeorgOstrovski commented 4 years ago

To re-iterate a reply from a previous thread:

We’re grateful for these suggestions and will consider them case-by-case. In this instance it is an explicit non-goal to incorporate as many DQN variants as we can. [...] being selective means that we can provide evaluation data on all 57 games. [...]

Please see also README.md and CONTRIBUTING for further details.

Side note: I think this discussion has exceeded the scope of the original question about the CsvWriter substantially ;)