diditforlulz273 / PokerRL-Omaha

Omaha Poker functionality+some features for PokerRL Reinforcement Learning card framwork
MIT License
61 stars 15 forks source link

Evaluation issues #6

Open 4e4ako opened 2 years ago

4e4ako commented 2 years ago

Hi Vsevolod!

I've tried to launch PLO_training_start.py with enabled LBR and failed (without any eval_methods iterations are running fine, but I can't evaluate results). I've tried both PLO and DiscretizedNLHoldem, with Debugging option turned on and off. When DEBUGGING=True, and nn_type "feedforward" or "dense_residual", I've got AssertionError:

/PokerRL-Omaha-master/DeepCFR/IterationStrategy.py", line 144, in get_a_probs_for_each_hand_in_list assert len(pub_obs.shape) == 2, "all hands have the same public obs" AssertionError: all hands have the same public obs

And if DEBUGGING=False I've got this error on iteration 1:

PokerRL-Omaha-master/PokerRL/rl/neural/MainPokerModuleFLAT2.py", line 109, in forward pf_mask = torch.where(pub_obses[:, 14] == 1) TypeError: list indices must be integers or slices, not tuple

If nn_type="recurrent", I've got error on iteration 0:

PokerRL-Omaha-master/PokerRL/rl/neural/MainPokerModuleRNN.py", line 157, in forward pub_obses = torch.from_numpy(pub_obses[0]).to(self.device).view(seq_len, bs, self.pub_obs_size) TypeError: expected np.ndarray (got Tensor)

My requirements.txt:

gym==0.10.9 (tried 0.12.5 too) numpy==1.21.2 psutil==5.8.0 pycrayon==0.5 pytz==2021.3 ray==0.6.1 (didn't use Distributed) scipy==1.7.3 torch==1.4.0 (tried Pytorch versions till 1.10 with CUDA 10.2)

diditforlulz273 commented 2 years ago

Hi! I think recurrent networks are legacy-type and never worked correctly, as they gave quite poor results on initial testing, so simply don't use them.

And this is clearly not a package dependencies problem - I guess, the reason is bugs in code - my local version is some commits ahead, and probably this is fixed there. But the problem is that I gave up on python around a year ago and rewritten it from scratch on C++ which is a whole different project. Therefore, I don't have any incentive to go back and check what's actually wrong or fix it, sorry, but you have to do it yourself :)