google-research / batch_rl

Offline Reinforcement Learning (aka Batch Reinforcement Learning) on Atari 2600 games
https://offline-rl.github.io/
Apache License 2.0
528 stars 74 forks source link

Reading atari files directly. #22

Open DuaneNielsen opened 3 years ago

DuaneNielsen commented 3 years ago

Hi, contributing this example of how to read the atari files directly, in case anyone wants to do that.

Note that the data is stored in the same temporal sequence it was logged, as you can see by watching the replay.

import gzip
import cv2
import numpy as np

STORE_FILENAME_PREFIX = '$store$_'

ELEMS = ['observation', 'action', 'reward', 'terminal']

if __name__ == '__main__':

    data = {}

    data_dir = '/home/duane/data/dqn/Breakout/1/replay_logs/'
    suffix = 0
    for elem in ELEMS:
        filename = f'{data_dir}{STORE_FILENAME_PREFIX}{elem}_ckpt.{suffix}.gz'
        with open(filename, 'rb') as f:
            with gzip.GzipFile(fileobj=f) as infile:
                data[elem] = np.load(infile)

    for obs in data['observation']:
        cv2.imshow('obs', obs)
        cv2.waitKey(20)
Altriaex commented 3 years ago

Here is another implementation.

from dopamine.discrete_domains import atari_lib
from dopamine.replay_memory import circular_replay_buffer
class DummyWrappedBuffer(circular_replay_buffer.OutOfGraphReplayBuffer):
    def __init__(self,
               observation_shape,
               stack_size,
               replay_capacity=1000000,
               batch_size=32):
        super(DummyWrappedBuffer, self).__init__(observation_shape, stack_size, replay_capacity, batch_size)

def load_logs(logs_dir, suffix):
    buffer = DummyWrappedBuffer(
    observation_shape=atari_lib.NATURE_DQN_OBSERVATION_SHAPE,
    stack_size=atari_lib.NATURE_DQN_STACK_SIZE)
    buffer.load(checkpoint_dir=logs_dir, suffix=suffix)
    # A dict with keys: observation, action, reward, terminal
    return buffer._store
if __name__ == '__main__':
    log_path = osp.join(log_base, game, log_split, "replay_logs")
    store = load_logs(log_path, "0")
jsw7460 commented 2 years ago

Thank you for great works. But I wonder whether the dataset cumulates sequentially. That is, along the implementation of Duane, whether data["observation"][1] indicates the state after we take data["action"][0] in data["observation"][0].

Additionally, did you know is there a similar dataset implemented with torch, not a dopamine?

agarwl commented 2 years ago

Yes, the dataset is composed of (s, a, r, s') tuples like the way you indicated (you can visualize the data in a colab/jupyter notebook). For a Pytorch version of the dataset, you can use the code for this NeurIPS'21 paper: https://github.com/mila-iqia/SGI/blob/master/src/offline_dataset.py.