jmacglashan commented 3 years ago

When creating items, you can index into the TrajectoryWriter.history with an int, or a python slice object and have it construct for you a corresponding TrajectoryColumn. However, it would also be nice if we could index into it with a tuple of ints, much like pulling out indices of a np array or TF tensor which would allow you to select multiple non-contiguous indices.

Motivation

The main motivation for this is for cases where you want to do things like N-step transitions. As things are, there are three approaches you can take in Reverb.

Write a single "time step" as a the complete transition (e.g, 2 observations, 1 action, 1 reward, 1 done)
Write trajectories in their proper time steps and when you create an item, slice the N+1 last observations and rewards
Manually book keep your append data references and manually construct TrajectoryColumn objects around them.

The problem with approach 1 is it doubles your memory stored for observations because you write the same observation twice, once when it's the ending observation of a transition and again when it's the beginning observation. This can be bad since observations are typically the most memory expensive field.

The second approach only writes the observation once and instead uses pointers in the table items to the right place, but has the problem that when you read data on a trainer through a dataset, you're always reading N+1 observations when you only need two. This turns out to be a non-trivial sampling performance hit, especially since observations are once again the biggest elements.

The third approach does exactly what we want, but the ergonomics aren't great because it means the user has to do all the history bookkeeping themselves, when the nice thing about the TrajectoryWriter is that it's already doing that for you in a nice interface.

Proposed feature

To better support this use case, It would be nice if we could slice a TrajectoryWriter history with tuple of ints to get/construct the right TrajectoryColumn.

That is, you could create your TrajectoryColumns from your history with a tuple slice for the observation

trajectory = {
    "observation": trajectory_writer.history["observation"][(-n-1, -1)],
    "action":  trajectory_writer.history["action"][-n],
    ...
}

Right now, I've written code that basically supports this, but requires me to use the private class and properties _ColumnHistory which seems bad because being private, later versions of reverb might change how it's organized. I believe the getter for this class, however, can be easily extended to support this with the following change, which then means I don't need external code that mucks about with private classes.

def __getitem__(self, val) -> 'TrajectoryColumn':
  path = self._path + (val,)
  if isinstance(val, int):
    return TrajectoryColumn([self._data_references[val]],
                            squeeze=True,
                            path=path)
  elif isinstance(val, slice):
    return TrajectoryColumn(
        self._data_references[val], path=path)
  elif isinstance(val, tuple):  # this elif is the new code
    return TrajectoryColumn([self._data_references[vi] for vi in val], path=path)
  else:
    raise TypeError(
        f'_ColumnHistory indices must be integers, a slice object, or a tuple of ints, not {type(val)}')

fastturtle commented 3 years ago

Hey @jmacglashan thanks for the detailed feature request! Just to make sure I understand correctly, would these two be functionally equivalent?

Using tuple indices:

trajectory = {
    "observation": trajectory_writer.history["observation"][(-n-1, -1)],
    "action":  trajectory_writer.history["action"][-n],
    ...
}

Without tuple indices. I haven't tested this out, so there is likely a bug somewhere ;):

trajectory = {
    "observation": reverb.TrajectoryColumn([
                      list(trajectory_writer.history["observation"][-n-1])[0],
                      list(trajectory_writer.history["observation"][-1])[0]
                   ]),
    "action":  trajectory_writer.history["action"][-n],
    ...
}

I agree that the second option is quite verbose. I think introducing this indexing should be fine, especially since users will be familiar with it [from numpy]*(https://numpy.org/doc/stable/reference/arrays.indexing.html#integer-array-indexing). I'll discuss this with the rest of the Reverb team and get back to you :)

* I was surprised to find that np.arange(10)[(0, 1)] doesn't work while np.arange(10)[[0, 1]] does. It looks like numpy treats tuples differently from lists for indexing I wonder if it's worth only accepting lists for consistency?

jmacglashan commented 3 years ago

Thanks! I think you're right that the second way you posted should be semantically the same. I didn't think of that one because I missed that the iterator of _ColumnHistory yields pybind.WeakCellRef, which is different from the __getitem__ which returns TrajectoryColumn (I assumed they were the same) But yeah, I do think it would be nice QoL to have the multiple index support all the same.

For numpy you're right that the first case fails because it first interprets a tuple as an index for multiple dimensions. You can still use a tuple for multiple indices in one dimension in numpy, but you have to nest it in another 1-d tuple so that it knows you're not defining multiple dimension indices. E.g., this works

np.arange(10)[((0, 1),)]

However, that is ugly :p and I would be perfectly happy with supporting multiple indices via a list in Reverb.

acassirer commented 3 years ago

I think this is an excellent idea! Thanks a lot @jmacglashan for the suggestion and the detailed motivation.

acassirer commented 3 years ago

I'm sending out a CL to add this feature.

acassirer commented 3 years ago

Change is in and this will be included in the next release. Thanks again for the suggestion!

google-deepmind / reverb

Feature request: Allow _ColumnHistory getter to support tuple of ints #65

Motivation

Proposed feature