JamesCarr1 / Chess_AI

0 stars 0 forks source link

Optimise Move Evaluation #3

Closed JamesCarr1 closed 1 year ago

JamesCarr1 commented 1 year ago

Move evaluation becomes unreasonably long above a depth of 1. This is partly due to a lack of tree pruning but could also be affected by the speed of commonly used functions. Find what is causing the most slowdown and try to optimise it. Possible culprits:

JamesCarr1 commented 1 year ago

On initial profile, get the current runtimes:

ncalls tottime percall cumtime percall filename:lineno(function) 1660/1 0.023 0.000 3084.970 3084.970 {built-in method builtins.exec} 1 0.006 0.006 3084.970 3084.970 data_generator.py:1() 1 0.001 0.001 3084.010 3084.010 data_generator.py:157(choose_move) 9323/1 0.516 0.000 3078.112 3078.112 data_generator.py:97(get_best_evals) 9323 0.279 0.000 3035.230 0.326 data_generator.py:137() 377076 1591.109 0.004 2887.103 0.008 {built-in method builtins.max} 26051947997 1362.365 0.000 1362.365 0.000 data_generator.py:137() 357644 81.482 0.000 147.855 0.000 {built-in method builtins.min}

Appears to be dominated by list comp calls, assuming this is .append calls, so make node addition more efficient.

JamesCarr1 commented 1 year ago

Following improved MoveTree code:

ncalls tottime percall cumtime percall filename:lineno(function) 1660/1 0.023 0.000 3080.024 3080.024 {built-in method builtins.exec} 1 0.005 0.005 3080.024 3080.024 data_generator.py:1(module) 1 0.000 0.000 3079.067 3079.067 data_generator.py:160(choose_move) 9323/1 0.535 0.000 3073.221 3073.221 data_generator.py:100(get_best_evals) 9323 0.300 0.000 3029.279 0.325 data_generator.py:140(listcomp) 377076 1600.541 0.004 2880.532 0.008 {built-in method builtins.max} 26051947997 1346.191 0.000 1346.191 0.000 data_generator.py:140(lambda) 357644 82.252 0.000 148.454 0.000 {built-in method builtins.min}

Main time sink appears to be line 140. Firstly, let's try to improve map() vs list comprehensions.

map() should be used for predefined functions, otherwise use a list.

JamesCarr1 commented 1 year ago

Significantly improved execution time by replacing lambda function with one min_max_eval on ~line140 of data_generator.py.

ncalls tottime percall cumtime percall filename:lineno(function) 1660/1 0.020 0.000 49.574 49.574 {built-in method builtins.exec} 1 0.005 0.005 49.574 49.574 data_generator.py:1() 1 0.000 0.000 48.624 48.624 data_generator.py:161(choose_move) 9323/1 0.510 0.000 42.805 42.805 data_generator.py:100(get_best_evals) 197281 2.551 0.000 26.926 0.000 data_generator.py:187(as_tensor) 197281 0.129 0.000 19.575 0.000 /Users/jamie/anaconda3/lib/python3.11/site-packages/chess/init.py:2413(fen) 197281 0.238 0.000 19.427 0.000 /Users/jamie/anaconda3/lib/python3.11/site-packages/chess/init.py:2681(epd) 197281 5.143 0.000 16.241 0.000 /Users/jamie/anaconda3/lib/python3.11/site-packages/chess/init.py:979(board_fen) 197281 0.194 0.000 12.841 0.000 /Users/jamie/anaconda3/lib/python3.11/site-packages/torch/nn/modules/module.py:1494(_call_impl) 197281 3.149 0.000 12.601 0.000 /Users/jamie/git/Chess_AI/model_builder.py:15(forward) 12625984 3.952 0.000 7.074 0.000 /Users/jamie/anaconda3/lib/python3.11/site-packages/chess/init.py:725(piece_at) 1 0.000 0.000 5.819 5.819 data_generator.py:76(update_moves_tree) 206596/1 0.210 0.000 5.819 5.819 data_generator.py:59(find_possible_moves) 197281 0.136 0.000 4.511 0.000 /Users/jamie/anaconda3/lib/python3.11/site-packages/torch/_tensor.py:920(iter) 197281 4.311 0.000 4.311 0.000 {method 'unbind' of 'torch._C._TensorBase' objects}

JamesCarr1 commented 1 year ago

Computation currently appears to be dominated by the conversion from FEN to a matrix. Adjusting the current code to a number of if statements does not appear to improve the efficiency (actually decreases it). Could consider writing in a more efficient language?

JamesCarr1 commented 1 year ago

as_tensor conversion has been significantly optimised, reducing the as_tensor calculation time from 26s down to 9s:

ncalls tottime percall cumtime percall filename:lineno(function) 1660/1 0.026 0.000 32.195 32.195 {built-in method builtins.exec} 1 0.005 0.005 32.195 32.195 data_generator.py:1() 1 0.001 0.001 31.199 31.199 data_generator.py:161(choose_move) 9323/1 0.461 0.000 25.191 25.191 data_generator.py:100(get_best_evals) 197281 0.191 0.000 13.285 0.000 /Users/jamie/anaconda3/lib/python3.11/site-packages/torch/nn/modules/module.py:1494(_call_impl) 197281 3.278 0.000 13.048 0.000 /Users/jamie/git/Chess_AI/model_builder.py:15(forward) 197281 2.681 0.000 9.063 0.000 data_generator.py:187(as_tensor) 1 0.000 0.000 6.007 6.007 data_generator.py:76(update_moves_tree) 206596/1 0.222 0.000 6.007 6.007 data_generator.py:59(find_possible_moves) 197281 0.131 0.000 4.666 0.000 /Users/jamie/anaconda3/lib/python3.11/site-packages/torch/_tensor.py:920(iter) 197281 4.474 0.000 4.474 0.000 {method 'unbind' of 'torch._C._TensorBase' objects}

Would now be ideal to improve the forward() call in the model.

JamesCarr1 commented 1 year ago

forward() is currently (and unnecessarily) converting a map to a list to a tensor and using torch.sum.

Try just using a list and list.sum():

ncalls tottime percall cumtime percall filename:lineno(function) 1660/1 0.021 0.000 29.584 29.584 {built-in method builtins.exec} 1 0.005 0.005 29.584 29.584 data_generator.py:1() 1 0.000 0.000 28.633 28.633 data_generator.py:161(choose_move) 9323/1 0.444 0.000 22.798 22.798 data_generator.py:100(get_best_evals) 197281 0.137 0.000 11.102 0.000 /Users/jamie/anaconda3/lib/python3.11/site-packages/torch/nn/modules/module.py:1494(_call_impl) 197281 2.880 0.000 10.929 0.000 /Users/jamie/git/Chess_AI/model_builder.py:15(forward) 197281 2.671 0.000 8.943 0.000 data_generator.py:187(as_tensor) 1 0.000 0.000 5.834 5.834 data_generator.py:76(update_moves_tree) 206596/1 0.213 0.000 5.834 5.834 data_generator.py:59(find_possible_moves) 197281 0.132 0.000 4.409 0.000 /Users/jamie/anaconda3/lib/python3.11/site-packages/torch/_tensor.py:920(iter) 197281 4.218 0.000 4.218 0.000 {method 'unbind' of 'torch._C._TensorBase' objects}

This has improved the performance by about 3 seconds.

JamesCarr1 commented 1 year ago

Now going to compare map() vs. a list

ncalls tottime percall cumtime percall filename:lineno(function) 1660/1 0.020 0.000 29.758 29.758 {built-in method builtins.exec} 1 0.005 0.005 29.758 29.758 data_generator.py:1() 1 0.000 0.000 28.765 28.765 data_generator.py:161(choose_move) 9323/1 0.446 0.000 22.925 22.925 data_generator.py:100(get_best_evals) 197281 0.164 0.000 11.260 0.000 /Users/jamie/anaconda3/lib/python3.11/site-packages/torch/nn/modules/module.py:1494(_call_impl) 197281 4.560 0.000 11.061 0.000 /Users/jamie/git/Chess_AI/model_builder.py:15(forward) 197281 2.657 0.000 8.908 0.000 data_generator.py:187(as_tensor) 1 0.000 0.000 5.840 5.840 data_generator.py:76(update_moves_tree) 206596/1 0.214 0.000 5.840 5.840 data_generator.py:59(find_possible_moves) 197281 0.133 0.000 4.864 0.000 /Users/jamie/anaconda3/lib/python3.11/site-packages/torch/_tensor.py:920(iter) 197281 4.672 0.000 4.672 0.000 {method 'unbind' of 'torch._C._TensorBase' objects}

This is slightly slower using a for loop. Try a list comprehension.

JamesCarr1 commented 1 year ago

With list comprehension:

ncalls tottime percall cumtime percall filename:lineno(function) 1660/1 0.023 0.000 28.593 28.593 {built-in method builtins.exec} 1 0.005 0.005 28.593 28.593 data_generator.py:1() 1 0.000 0.000 27.636 27.636 data_generator.py:161(choose_move) 9323/1 0.457 0.000 21.809 21.809 data_generator.py:100(get_best_evals) 197281 0.141 0.000 9.960 0.000 /Users/jamie/anaconda3/lib/python3.11/site-packages/torch/nn/modules/module.py:1494(_call_impl) 197281 0.385 0.000 9.779 0.000 /Users/jamie/git/Chess_AI/model_builder.py:15(forward) 197281 2.681 0.000 9.030 0.000 data_generator.py:187(as_tensor) 1 0.000 0.000 5.827 5.827 data_generator.py:76(update_moves_tree) 206596/1 0.212 0.000 5.827 5.827 data_generator.py:59(find_possible_moves) 197281 3.344 0.000 4.785 0.000 /Users/jamie/git/Chess_AI/model_builder.py:20() 197281 0.136 0.000 4.536 0.000 /Users/jamie/anaconda3/lib/python3.11/site-packages/torch/_tensor.py:920(iter) 197281 4.340 0.000 4.340 0.000 {method 'unbind' of 'torch._C._TensorBase' objects}

Which is appreciably faster!

JamesCarr1 commented 1 year ago

Finally, try just summing in a for loop:

ncalls tottime percall cumtime percall filename:lineno(function) 1660/1 0.024 0.000 28.981 28.981 {built-in method builtins.exec} 1 0.005 0.005 28.981 28.981 data_generator.py:1() 1 0.000 0.000 27.977 27.977 data_generator.py:161(choose_move) 9323/1 0.444 0.000 22.131 22.131 data_generator.py:100(get_best_evals) 197281 0.164 0.000 10.461 0.000 /Users/jamie/anaconda3/lib/python3.11/site-packages/torch/nn/modules/module.py:1494(_call_impl) 197281 3.988 0.000 10.262 0.000 /Users/jamie/git/Chess_AI/model_builder.py:15(forward) 197281 2.645 0.000 8.904 0.000 data_generator.py:187(as_tensor) 1 0.000 0.000 5.845 5.845 data_generator.py:76(update_moves_tree) 206596/1 0.212 0.000 5.845 5.845 data_generator.py:59(find_possible_moves) 197281 0.125 0.000 4.700 0.000 /Users/jamie/anaconda3/lib/python3.11/site-packages/torch/_tensor.py:920(iter) 197281 4.518 0.000 4.518 0.000 {method 'unbind' of 'torch._C._TensorBase' objects}

Slower, so use the list comprehension.

JamesCarr1 commented 1 year ago

Actually seems to be a fluke - summing in a for loop is faster.

JamesCarr1 commented 1 year ago

Currently dominated by forward() and as_tensor(), so focus on those.

JamesCarr1 commented 1 year ago

Actually having a quick look at find_possible_moves. This is dominated by the lines:

if self.current_position.outcome() is None: # If game is over, don't try to find any more moves self.find_possible_moves(depth - 0.5, new_tree) # find valid moves. One move is 'half a depth' so subtract 0.5

Of this, naively removing the if statement approximately halves the computation time (2.753s vs. 5.845).

JamesCarr1 commented 1 year ago

Turns out the if statement is totally unnecessary. self.current_position.legal_moves will just be an empty list if the game is over, so nothing will be added.

JamesCarr1 commented 1 year ago

Closing this issue for now - am going to look at alternative optimisation methods.