Klazkin / player-zero

1 stars 0 forks source link

Implementing Player-Zero Training (2/2) #83

Closed Klazkin closed 2 months ago

Klazkin commented 7 months ago

Explain the context of the issue, what is being addressed in detail.

The goal

Time tracking

Time Estimate: undefined Time spent: 7 hours 48 minutes

Resources

https://realpython.com/python-sockets/ https://docs.godotengine.org/en/stable/classes/class_packedbytearray.html https://docs.godotengine.org/en/stable/classes/class_streampeertcp.html#class-streampeertcp https://docs.godotengine.org/en/stable/classes/class_streampeer.html#class-streampeer-method-get-string https://keras.io/guides/serialization_and_saving/ https://rl-vs.github.io/rlvs2021/class-material/regularized_mdp/Regularization_RL_RLVS.pdf

Klazkin commented 7 months ago

Investigate

Simulating generated/sim_0_745970196_1712312526.077.txt...
New unit: 17,9,-1,2,1
[13, 27, 3, 14, 9, 8, 4, 7]
[13, 14, 9, 8, 27]
New unit: 13,8,0,-1,-2
[5, 13, 27, 3, 23, 2, 9, 6]
[13, 9, 6, 27]
Vector2i(10, 5)
Vector2i(3, 8)
Not a proper blooddrawing node

issue: casting Blooddrawing when there are no actions that can be replenished. (later in TAB it is expected to search for a branch)

Klazkin commented 7 months ago

for normal node:
    Node -> predict -> get policy get value
    expand children
        child score is value
        child policy added

for BLOODDRAWING node:
    Node -> predict -> get policy get value
    (policy is ignored)
    expand random branches
        branch score is 
        branch policy is 0 ```