Closed tnfru closed 8 months ago
I am not an expert in either machine learning or GPU computation, so any advice would be appreciated.
Yes, this paper is mainly about developing a strong agent, with only a brief mention of GPU computation. However, I could find little decent literature on GPU computation of the CFR algorithm.
The rough outline of each iteration of the CFR algorithm is as follows (sorry if you already know the CFR algorithm):
Note that the event of dealing the private cards is not included in the game tree. All possible hands are processed together at the leaf nodes. This allows us to reduce the computational complexity: the current implementation takes O(m+n+c)-time for each public action line, whereas a naive implementation requires O(mn)-time (where m and n are the numbers of possible hands of both players, and c = 52 is the number of cards in a deck).
This is an overview of the process, but please ask me if you want more details.
Thanks for the in depth explanation. I''m looking into a few papers I found and will evaluate them. Will update afterwards!
Both look pretty good. There would also the way of training an AI to imitate the solver results - after enough training it would be blazing fast, but that's probably out of the scope of this project.
Are you interested in implementing the paper / transforming the repo to rust?
I'm not a good candidate for implementing these - lack both rust and GPU specific knowledge (my GPU usage is generally via pytorch API...)
I don't know if I would implement it, but if I do, I don't think I would use Rust GPU or Rust CUDA. Instead, I would create a library in CUDA C++ and link it with cc-rs. Although the documentation states that only the GNU/Clang toolchain is supported, I have confirmed that I can run a sample program with the Windows MSVC toolchain.
I don't know if I would implement it, but if I do, I don't think I would use Rust GPU or Rust CUDA. Instead, I would create a library in CUDA C++ and link it with cc-rs. Although the documentation states that only the GNU/Clang toolchain is supported, I have confirmed that I can run a sample program with the Windows MSVC toolchain.
That sounds very interesting, sadly I won't be much of a help in C++.
Unfortunately, using GPU's to speed up CFR is tricky even when you are using ML methods that employ them already because you need to load all the data onto the GPU, and then once it's done unload it all, which is a massive bottleneck in an application like this that already doesn't have good ways to manage the cache (The gpucfr implementation uses a tiny game tree, and if my memory serves me, it keeps the entire tree in the GPU memory so this bottleneck doesn't exist). I have considered trying to use the GPU for terminal node evaluation since it might be able to leverage some GPU tricks, but it would probably require a huge amount of work to get any real performance benefit.
Originally posted by @b-inary in https://github.com/b-inary/desktop-postflop/issues/1#issuecomment-1369692222
I'm not familiar with rust itself, but I work in ML and skimmed through the paper. It looks more about running a specific kind of neural network as agent than about the parallelization of the depth tree. I think about the possibility of GPU support like this: After you build the tree there are N bottom leaves, which need to be calculated to calculate the previous moves. These are the last moves done by the last player before the pot ends in any way (Showdown or Fold). How expensive is this computation? We can assign GPU cores to calculate the values of each leaf and then recursively use them to build the previous ones. This largely depends on the operation of the leaf. Can you elaborate on that? I checked the code but I think I'd need to take a deep dive to get that information.