google-deepmind / open_spiel

OpenSpiel is a collection of environments and algorithms for research in general reinforcement learning and search/planning in games.
Apache License 2.0
4.26k stars 934 forks source link

Infostate Tree Memory Usage #783

Closed dpmaloney closed 1 year ago

dpmaloney commented 2 years ago

I've been working on using the new LP tools to solve some games, including poker subgames and was wondering if anyone had any ideas for optimizing infostate trees and their memory usage.

It seems like there are lots of calls to .Clone or .Child, specifically with subtree rebalancing and updating the leaf nodes. It doesn't seem like std::vector<std::unique_ptr<State>> corresponding_states_ is used for anything at the moment either.

There are a number of other things that seem like they could be cut directly for my own work, but I was wondering if there was an approach more inline with open spiel as a whole that might help.

Thanks in advance for the help!

lanctot commented 2 years ago

@michalsustr, can you comment on this?

lanctot commented 2 years ago

There are a number of other things that seem like they could be cut directly for my own work, but I was wondering if there was an approach more inline with open spiel as a whole that might help.

I have tagged the original author, he might have some suggestions, and maybe we can remove some of the stuff that is unused or unnecessary.

But I would say if you can easily cut stuff for your own work, then fork and go right ahead. We designed OpenSpiel with "hackability" as a target property -- if it is easy to jump in and cut stuff out and customize for your own use cases then that's great.. it's exactly what we were hoping for.

dpmaloney commented 2 years ago

Thanks for the reply @lanctot!

I’ve definitely been hacking away, but I don’t want to get too far from earth such that I can’t submit a PR at some point should I come up with something interesting.

Just to get them in writing,

Thanks in advance for your time, I really appreciate the work you guys do!

lanctot commented 2 years ago

Hi @dpmaloney, unfortunately I'm just not familiar with that code, so I can't really answer. I'm not sure why there would be any rebalancing (or storing / copying of states at all), so I will contact the author by email to ask him to follow-up.

michalsustr commented 2 years ago

Hi @dpmaloney !

was wondering if anyone had any ideas for optimizing infostate trees and their memory usage.

Recently @milecdav wrote some code that constructs the infostate trees for both players without maintaining the corresponding states https://arxiv.org/abs/2112.10890 . However, the problem is that you need to avoid building the leaves with terminal utilities, because you will still encounter a large memory blowup. A special leaf evaluator is needed that can work with just the $n$ possible poker hands and beliefs (ranges) -- one such evaluator that works with $O(n)$ complexity is described here: https://poker.cs.ualberta.ca/publications/ijcai2011_accelerated_best_response.pdf (section on Efficient Terminal Node Evaluation).

I think we can publish the code if @milecdav is fine with it.

To answer other questions:

Is there a lighter/and perhaps just as memory safe way to do ApplyAction/Child? Smart pointers make this tough as is, but at the moment that is my big bottleneck.

I don't understand. These are the methods on State and it is a standard interface. If you wanted to optimize Poker, it would be probably useful to reimplement it properly in OpenSpiel instead of wrapping ACPC implementation.

Is LeafID ever assigned? Very possible I’m just not seeing something but it doesn’t seem used ever.

Hmm I think I didn't use LeafIds in the end :-) I think the idea was to use them to connect the infostate trees together, like in the sequence form LPs, but only array indices were used in the end.

Same with corresponding_states, I just removed them but I’d love to know the motivation behind the implementation. It seems like https://github.com/deepmind/open_spiel/issues/398 had some discussion about some payoff matrix struct, so maybe this is supposed to emulate that in a way?

I have an implementation that selectively stores them or not, based on intended usage. I used them mostly to be able to build continuations of subgames for continual resolving.

Is rebalancing necessary? I’m currently just implicitly adding all leaf nodes to the last row of the node depth table, which seems to work fine without needing to add in filler nodes.

This is used in infostate cfr implementation (not in master) as it speeds up some evaluations.


I think I could put out some of the more mature code I made as it could be beneficial to more people. The thing is sometimes I used shortcuts and hacks even to the core library and it could take some time to disentangle everything. Also it's quite a bit of code -- just to give you an idea, I'm almost 800 commits off master.

lanctot commented 2 years ago

@michalsustr , the code you are referring to by @milecdav would be poker-only, though, right? It's not a general implementation that does not store states?

lanctot commented 2 years ago

Recently @milecdav wrote some code that constructs the infostate trees for both players without maintaining the corresponding states https://arxiv.org/abs/2112.10890 . However, the problem is that you need to avoid building the leaves with terminal utilities, because you will still encounter a large memory blowup.

Yes, true, but probably not worse than the blow-up (and slow-downs due to rebalancing) of having to store full states?

As in, it might be good to have an info set tree implementation that builds the entire game, stores whatever is necessary for computing payoff values, and can still be used by the LP methods? Or, alternatively, a general implementation of sequence-form strategies that we could build sparse matrices from to send to LP solvers. This is something I would have loved to have on many occasions, would love to see it in OpenSpiel.

milecdav commented 2 years ago

@michalsustr , the code you are referring to by @milecdav would be poker-only, though, right? It's not a general implementation that does not store states?

Yes, that is correct.

dpmaloney commented 2 years ago

As in, it might be good to have an info set tree implementation that builds the entire game, stores whatever is necessary for computing payoff values, and can still be used by the LP methods? Or, alternatively, a general implementation of sequence-form strategies that we could build sparse matrices from to send to LP solvers. This is something I would have loved to have on many occasions, would love to see it in OpenSpiel.

I think this might get at the heart of my question, I'm trying to implement/develop some techniques for reducing the size of the payoff matrix that are game-agnostic so this would also be something I'd love to have.

@michalsustr, my main goal is to do work with LP solving for larger games, so I think I need to compute the terminal utility beforehand, but I think the motivation behind my question about ApplyAction and Child was just wondering how much data we actually need for just LP solving.

I'd be happy to take a crack at working on something like this, but I don't want to step on any toes with the implementation so please feel free to follow up with any discussion that might be helpful.

lanctot commented 2 years ago

Hi @dpmaloney, yeah what I'd really love is to have a data structure to represent sequence-form strategies (much like we have for behavioral strategies, e.g. TabularPolicy).

But even separately from that, if you have a high-enough level API for the front end of your LP solver then you can build everything you need in one full-game tree pass. You can see how it's done on the python side here (code is a fairly dense though): https://github.com/deepmind/open_spiel/blob/master/open_spiel/python/algorithms/sequence_form_lp.py. This doesn't need any storing.

And hopefully, the solutions would be returns as general sequence-form strategy objects that can return a behavior policy on demand via ToTabularPolicy.

I think it would be fine to have this implementation in addition to the current one especially for use cases like yours where you're memory-bound. It'd a fairly hefty contribution, but I think it'd be as easy as copying over what's done on the Python side. It'd be quite a good one to have and would happy to explain some of the Python implementation if necessary.

Let me know what you think!