Closed Aladoro closed 4 years ago
This sounds great - thank you! Just had a look at your notebook and will check the code changes in a bit. I assume this scales to larger memory sizes and gives a small speedup with the standard Rainbow hyperparameters as well?
I am very sorry, I have not had the time to run full experiments with the standard hyperparameters to get accurate timings. However, on my machine, the overall predicted training time after training starts with the vectorized replay memory appears as about 103 hours (~135 it/s), while with the current replay memory it appears as about 124 hours (~112 it/s). So I am quite sure there is a significant speed-up also in this case.
By the way, thank you for this great repo! ^^
Great - thanks for doing this, and I'm sure many others will get to benefit too ^^
I have noticed the prioritized buffer currently samples and updates elements in the segment tree one at a time. I have rewritten the segment tree and memory class to use efficient vectorized retrievals/updates. I used numpy's structured arrays to avoid having to index into multiple data matrices one at a time.
On my machine this leads to x2-x10 speedups when sampling/updating batches, cutting the overall training time from about 40 minutes (~45 it/s) to about 28 minutes (~65 it/s) when running data-efficient Rainbow.
I have made an additional branch called parallel-memory-tests to briefly show (inside _memorytests.ipynb) that the new vectorized memory's behavior is identical to the current memory's behavior (when sampling deterministically) and the relative timings.