aikupoker / deeper-stacker

DeeperStacker: DeepHoldem Evil Brother
39 stars 3 forks source link

data generation and running #4

Closed light3317 closed 5 years ago

light3317 commented 5 years ago

Hi, @aikupoker A few nice modifications of the original fork. Just wondering what machines/settings do you use to generate data? And how long it's going to take? For it to run with reasonable speed, what hardwares are needed?

herrefirh commented 5 years ago

i can chime in since i also have a question here were my average times approx (avg seconds/sample): gtx 1070: 2.3 tesla p100: 3.5 tesla p4: 4.5 tesla k80: 5.6 tesla v100: 5.7 i used cuda 8 or 9, depending tell me if my numbers are significantly offbase because maybe i misconfigured or something so the math is, how many samples do you want? 100,000? well, 100000*2.3 = 230,000 / 86,400 = 2.6 days or you can run with multiple gpus or multiple computers. remember to read the readme to understand difference between samples, situations, files. it's explained.

i haven't trained yet, i'll be doing that soon, i'll post when i get it

but i wonder out of curiosity, what accounts for the difference in thinking time between DeepStack and DeepHoldem?

light3317 commented 5 years ago

@herrefirh

this is about training or game playing speed? I thought to generating the samples and train it with 10 mil samples it would take much longer time than 2.6 days and requires multiple gpu clusters to run?

herrefirh commented 5 years ago

okay remember i say you must read the readme because you totally used the words incorrectly 10 million samples = 100,000,000 million poker situations. that's not what deepstack did. they did 10 million games (called poker situations here) which is 1 million samples. 1 million samples = 10x100,000 = 2.6 days x 10 = 26 days. so 26 days. that's a big difference. 1 million samples vs 100,000 samples. also known as 10 million situations vs 1 million situations

but look, i resized the stack size, big blind, ante so maybe it's faster, though i'd be surprised. maybe you can try and post your results. i am curious to know

and this is just the river, so i'll let you know what my numbers are for the turn and flop. they could be different

aikupoker commented 5 years ago

Hi, @aikupoker A few nice modifications of the original fork. Just wondering what machines/settings do you use to generate data? And how long it's going to take? For it to run with reasonable speed, what hardwares are needed?

Hi @light3317

most of this modifications are taken from other developer called @HonyZahy . I just make a few improvements to make it a bit faster.

I added in README.md file a important section talking about Samples and how you can calculate what you need to have the same trained network:

https://github.com/aikupoker/deeper-stacker#samples-math

The question is not easy to answer because:

With the same # of samples, you will have different results because they are randomly generated.

You can have a trained NN in just one day (or in hours) but it will take more wrong decisions.

Just to give you a simple rule of thumb, you can use Amazon AWS or Google Cloud Platform and it will cost you around 3,000-1,500$.

@herrefirh has give you an average of samples generation per second. You can scale linearly, just adding a new GPU. Also, it is not the same to generate river samples than turn samples, because to generate turn samples you need to generate them using river network. I'll recommend you to read README.md file and estimate costs doing simple maths.

Good luck!

light3317 commented 5 years ago

@herrefirh thanks for pointing that out. I will also share my data if i successfully run it.

@aikupoker thanks for the clarification. I probably will want to generate similar amount of sample data or more. I noticed in your readme doc DeepStack is using a GeForce GTX 1080, which should be less powerful than NVIDIA Tesla P100. Any idea why DeepStack avg thinking speed is faster?

For abstraction actions, if we add more situations, such as bet 1/3 pot, how will it influence the training result? Or would the training consume more time/need more sample?

I am also confused with this quote " It will take about 20 minutes to load all the flop buckets, but this is actually not necessary until you've created your own flop model. "

This loading happens on each new hand/matchstate? And it only happens for the flop or turn and river as well?

"During re-solving, the opponent ranges were not warm started" --what does this mean and is it essential to fix it?

aikupoker commented 5 years ago

Hi @light3317

There could be a tons of possibilities about why DeepStack avg thinking speed is faster. The root problem is we don't have DeepStack code (we only have Leduc DeepStack) and there should be code optimizations (including cache) in DeepStack code.

For abstraction actions, if we add more situations, such as bet 1/3 pot, how will it influence the training result? Or would the training consume more time/need more sample?

The training would consume more time because you have to evaluate one action more, but you would have a more precise neuronal network. As I said this is a trial and error and it is up to you to decide what to improve or not.

I am also confused with this quote " It will take about 20 minutes to load all the flop buckets, but this is actually not necessary until you've created your own flop model. " This loading happens on each new hand/matchstate? And it only happens for the flop or turn and river as well?

As README.md file said, DeepHoldem is composed of four network:

If you want to test just preflop neuronal network, you don't need to load flop buckets. But when you will be on flop, it will be fail because you will need to load turn network. That's only to test preflop neuronal network.

This loading happens on each new hand/matchstate? And it only happens for the flop or turn and river as well?

When you start your DeepHoldem client (cd Source && th Player/deepstack.lua <port2>.), there is a initial load of flop buckets and it is just each time that you start your DeepHoldem client.

"During re-solving, the opponent ranges were not warm started" --what does this mean and is it essential to fix it?

No, it is not but if you want to improve your model you could do it (pull request is welcome).

light3317 commented 5 years ago

@aikupoker thanks for the explanation. Just wondering, do you have it up and running? Do you use GeForce GTX 1080 or Tesla P100? If you are running it on GCP, which instance type do you choose?

aikupoker commented 5 years ago

Yes, I do. I don't use any cloud provider because I use my PC and I have one Nvidia GEFORCE GTX 1080 Ti.

@happypepper tried it with Tesla P100 and posted that results. DeepStack team tried it with GeForce GTX 1080.

herrefirh commented 5 years ago

Sorry to pile on, but on the topic of data generation. I am going through it line by line, it calls on several other files. Including terminal equity, where it looks at strength view and call matrix, I get that so far. But it also goes to the range generator and here I am lost...

I understand this:

Returning Evaluate Fast Hands:  
 25  20  26  10  18   1   2

Strength View 1 ->      
Columns 1 to 11
-12500 -12500 -12500 -12500 -12500 -12500 -12500 -12500 -12500 -12500 -12500

Strength View 2 ->      
Columns 1 to 11
-12500 -12500 -12500 -12500 -12500 -12500 -12500 -12500 -12500 -12500 -12500

Call Matrix After Copy: 
Columns 1 to 26
 0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0
 0  0  0  0  0  0  0  0  0  0  0  1  0  0  0  0  0  0  0  0  0  0  0  0  0  0

Call Matrix After Csub: 
Columns 1 to 26
 0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0
 0  0  0  0  0  0  0  0  0  0  0  1  0  0  0  0  0  0  0  0  0  0  0  0  0  0

I don't understand this:

RangeGenerator | Sorted Range | function RangeGenerator:generate_range(range)
Columns 1 to 10
0.01 *
  0.0055  0.0035  0.0000  0.0022  0.0028  0.0005  0.0402  0.0047  0.0180  0.0047
  0.1170  0.0451  0.0000  0.3224  0.0000  0.0013  0.0178  0.0447  0.0725  0.0038

RangeGenerator | Reordered range back to undo sort by strength  
Columns 1 to 10
0.01 *
  0.0060  0.0000  0.0181  0.0008  0.0000  0.0002  0.0114  0.0033  0.0012  0.0123
  0.0010  0.0000  0.0000  0.0000  0.0000  0.0000  0.0000  0.0017  0.0000  0.0005

DataGen | Player Ranges -> 1    
Columns 1 to 10
0.01 *
  0.0060  0.0000  0.0181  0.0008  0.0000  0.0002  0.0114  0.0033  0.0012  0.0123
  0.0010  0.0000  0.0000  0.0000  0.0000  0.0000  0.0000  0.0017  0.0000  0.0005
  0.0385  0.0597  0.0213  0.0014  0.0045  0.0008  0.0435  0.9691  0.0021  0.0568

I don't conceptually understand this... how it's generated (complicated math) or what it represents in the big picture (should be simple). Earlier we established a call matrix based on hand strength relative to the board and our opponent. Then we switch to magic numbers. What does it mean?

aikupoker commented 5 years ago

To understand about these magic numbers you can check this YouTube video:

https://youtu.be/qndXrHcV1sM?t=2173

Minute 36:13

Also, the image that they are talking is Figure 3: https://spencer-murray-zfht.squarespace.com/figures

There is a basic tutorial about how DeepStack works and internals: https://github.com/lifrordi/DeepStack-Leduc/blob/master/doc/manual/tutorial.md https://github.com/lifrordi/DeepStack-Leduc/blob/master/doc/manual/internals.md

Also taken from DeepStack paper:

The ranges are encoded by clustering hands into 1,000 buckets, as in traditional abstraction methods, and input as a vector of probabilities over the buckets. The inputs to the network are the pot size, public cards, and the player ranges, which are first processed into hand clusters. The output from the seven fully connected hidden layers is post-processed to guarantee the values satisfy the zero-sum constraint, and then mapped back into a vector of counterfactual values

herrefirh commented 5 years ago

Awesome, I'm watching the video now and I'll be reading the other things as well). Very helpful. Thanks again

light3317 commented 5 years ago

@aikupoker In regard to playing speed, code wise is it related to model size only? The bigger the model size (more data generated and trained), the longer it will take to respond/solve on any street?

aikupoker commented 5 years ago

You will have the same structure for both network and the difference between them it is the weights. They have different values and that's why a neuronal network take different decision (some of them are the best, and that's why you train it).

light3317 commented 5 years ago

so playing speed has nothing to do with the size of the model? I suppose if we have larger model it will go through more computations? What factors will affecting playing speed (code wise)?

herrefirh commented 5 years ago

About player range. 1326 is the number of hole card combinations. 10 is the batch size. Player range (and opponent range) are 10x1326 matrices/tensors.

Does that mean each of the 10 rows in the player range represents represents a separate game (poker situation)?

It's the only thing I can think of, it's the only reason I can think of for why there'd be 10 different rows

[torch.FloatTensor of size 10x1326]

herrefirh commented 5 years ago

light. one thing to note is the model doesn't grow in size as you train it more. on epoch 1 it might be 1 gig. but on epoch 1000000000 it will still be 1 gig. how much does the number of poker situations influence the size of the model? i don't know. i would think it wouldn't matter, but i don't know.

The deepstack pdf says a few things affect playing speed: