google-deepmind / reverb

Reverb is an efficient and easy-to-use data storage and transport system designed for machine learning research
Apache License 2.0
700 stars 93 forks source link

deepmind.reverb.PriorityTableCheckpoint exceeded maximum protobuf size of 2GB #86

Closed mikygit closed 2 years ago

mikygit commented 2 years ago

Hello, I get this error using reverb. Might be related to checkpoints. Any idea?

sabelaraga commented 2 years ago

Hey Mike,

This is because the table is checkpointed as a protobuf and it has grown beyond 2GB (which cannot be serialized when the size is a 32bit int).

How are you triggering the checkpoints?

mikygit commented 2 years ago

using checkpoint() method after each trajectory found. Only one process is checkpointing and multiple are inserting the database.

Are you saying it is not possible to checkpoint more than a certain amount of data? Shouldn't it be related at some point with the maximum capacity of the running server?

sabelaraga commented 2 years ago

Hi Mike,

Yes, at the moment it is not possible to checkpoint a table that is larger than 2GB. Note that the checkpoint of a Table doesn't include the data (only references to the items, se details of the implementation in https://github.com/deepmind/reverb/blob/master/reverb/cc/platform/tfrecord_checkpointer.h). This suggest that the table is growing indefinitely (which matches the description of one process checkpointing and multiple inserting). Checkpointing after each insertion would also degrade performance quite heavily.

Could you describe a bit more your use case?

mikygit commented 2 years ago

Simple use cas actually. Many samplers, one learner. Need to store data so that I can re-use them later. I understand I can not store more than 2GBs of data. Weird. How are you guys coping with that limitation? Aren't you storing data at some point? Is there a way to get the checkpoint size anytime so that I can at least configure the max number of items?

sabelaraga commented 2 years ago

I think the checkpoint was designed more as a fail-recovery mechanism rather than to create offline datasets.

If I understand correctly, this is creating multiple snapshots of the data so, unless the sampler/learner doesn't remove any sample, in the end you never have a full view of all the data that was inserted in the table. If you really want to store the tables, one option would be to shard them (so, create multiple tables within the same server) or event o shard the server (and create multiple server instances). Each writer can just send data to one of the tables/servers, and the sampling can be done by interleaving several tf.data.Datasets.

If you want to create an offline dataset, I would recommend taking a log at Envlogger and RLDS. Checkpointing blocks the server and would really affect your performance. Envlogger is optimized to avoid interfering in the speed.

mikygit commented 2 years ago

Ok thanx for the hints. BTW I don't expect to have a full view of all the inserted data but just a back-up of the current data. Removed data does not matter.

Is this correct to say that after 2GB it can not recover any data?

If i understand correctly, sharding is just a way to store more data (since more servers) but eventually, if all servers are full, one can not store any more right?

sabelaraga commented 2 years ago

Yes. But keep in mind that the error is because there are more than 2GB of metadata, the data itself is checkpointed separately (see the Chunks in https://github.com/deepmind/reverb/blob/master/reverb/cc/platform/tfrecord_checkpointer.cc#L159). The checkpoint of the PriorityTableCheckpoint only includes the references to the chunks.

mikygit commented 2 years ago

Ok. Not sure i understand what you mean by metadata actually ... I understand tables only contains references so that we can have multiple tables pointing to the data. But in the end, I guess that what is stored on disk when checkpointing is actual data right?

sabelaraga commented 2 years ago

Both things are stored. But the PriorityTableCheckpoint proto that is groing over 2GB, is only table metadata and references to the chunks. The chunks (i.e., the actual data) is written as separate protos.

https://github.com/deepmind/reverb/blob/master/reverb/cc/platform/tfrecord_checkpointer.cc

mikygit commented 2 years ago

Ok thanx a lot. Would it make sense to keep using reverb and let Envlogger or RLDS handle the dataset? I guess it's either Envlogger or RLDS not both right?

I foresee a problem to load such stored dataset into reverb though ...

sabelaraga commented 2 years ago

The recording happens through Envlogger, but yes, once you have the offline dataset generated, you would need to have a process that reads from the dataset and feeds it into reverb. On the other hand, you can also feed the datasets directly to the learner instead of going through Reverb.

mikygit commented 2 years ago

Hello, I played around with envlogger. While I managed to store data on filesystem, it looks like to creates tones of timestamps subfolders which makes it unreadable (fatal error).

Any hints?

mikygit commented 2 years ago

I get a ValueError: Reader init fails: RESOURCE_EXHAUSTED: open() failed: Too many open files; reading /tmp-network/fast/project/vdt/tests/20211207T113440669939/episode_metadata.riegeli, <class 'pybind11_abseil.status.StatusNotOk'>

Surely I'm doing it wrong since I only tried on 10000 trajectories :-(

sabelaraga commented 2 years ago

I'd recommend to open an issue in the envlogger repo (http://github.com/deepmind/envlogger). I'll close this one but please reopen if there is anything else that should be done on the Reverb side.

sabelaraga commented 2 years ago

Note that to reduce the number of files you can configure the max number of trajectories per file https://github.com/deepmind/envlogger/blob/main/envlogger/backends/riegeli_backend_writer.py#L35