Sampling terribly slow - Githubissues

google-deepmind / reverb

Reverb is an efficient and easy-to-use data storage and transport system designed for machine learning research

Apache License 2.0

704 stars 92 forks source link

Sampling terribly slow #80

Closed mikygit closed 2 years ago

mikygit commented 3 years ago

Hello, I've been playing with reverb for a few days and I notoced that sampling is terribly slow as opposed to insertions.

Although the server_info tells me that I'm sampling at a rate or approximately 45k items per sec, on the client side, it's more close to 1 item per sec :-(

My client and server are running on 2 seperate machines at the moment but I also tried having both the client and the server on the same machine (and also in the same process) in vain. It's still very slow.

Stored data are quadriplet of numpy arrays (obs, actions, ...) The config of the server is the one found in the Readme, nothing fancy. The client retrieves data per 256 num_samples. 64 times. I tried to sample everything in once with no effect. I'm using client.sample(table_name, num_samples) to sample. There are up to one million items in the table but it is slow even with 2000 items. And I'm using pytorch.

I'm a bit disapointed ... Any idea, recommandations?

Thanx!

mikygit commented 3 years ago

Actually I was wrong, it is much faster wieh client and server are sharing the same process.

Does this mean gRPC is the bottleneck? :-(

fastturtle commented 3 years ago

gRPC will only be a bottleneck when running at extremely high QPS (hundreds of thousands of QPS) and should not be a bottleneck fo your setup.

Am I correct in saying that Reverb is reporting that 45k items/s are being sampled? But you are only seeing 1 item/s being sampled? It sounds like there may be a mismatch between what you and Reverb are calling an item (or perhaps you're not using some of the items returned by Reverb?). In Reverb an item is a "sampleable" unit. ML workloads often sample a batch of these (looks like 256 in your case) to take a single (SGD) step.

I would also encourage you to look into using the TrajectoryDataset instead of Client.sample(), which will have higher Python overhead costs. This recommendation holds even if you're using pytorch (we use it with jax).

mikygit commented 3 years ago

Hi, Thank for your answer. I understand your point but still, in the end, sampling the data is faster when the sampler and the server share the same process (=withtout gRPC) than if they are in 2 seperate processes. At least the conclusion of my tests...

Are you saying it should not?

fastturtle commented 3 years ago

There will be some overhead when going through gRPC but this should only be a problem if you are pushing a lot of QPS. I would be surprised to see you being affected by this at 45k inserts/s, how many samples/s are you seeing?

mikygit commented 3 years ago

From a client side perspective, 64 sampling of 256 'items' (quadriplets) is ~8 times slower with gRPC (=when server and sampler do not share same process) than without. This, for a constant server_info current_size of 4000.

Am I the only one having these numbers?

qstanczyk commented 2 years ago

Is the data being added to the table in the background? Could sampling be slow due to the setting of the rate limiter which could block sampling? What is the CPU usage of Reverb? If it is low, this is most likely the rate_limiter or networking throughput issue.

qstanczyk commented 2 years ago

Closing this one as this is an old issues. Please reopen if this is still a problem.