Closed Debilski closed 2 weeks ago
well yes, the CI engine needs a serious revamp... should I work on it or are you already doing stuff? I'd like for the thing to at least read the same conf file as the pelita-server, so that we don't have to specify the list of players twice.
Yeah, I’ll do some minor refactorings later to make it more useful. Scores so far:
# name matches score (1/0/-1)
aspp2021_4 219 0.88
aspp2023_2 217 0.75
aspp2022_2 218 0.64
aspp2021_3 218 0.61
aspp2021_1 217 0.45
aspp2019_3 218 0.37
aspp2022_0 217 0.31
aspp2023_4 217 0.18
aspp2021_0 217 -0.02
aspp2021_2 218 -0.32
aspp2019_1 218 -0.37
aspp2022_1 217 -0.41
aspp2019_4 217 -0.47
aspp2022_4 218 -0.49
aspp2022_3 217 -0.58
aspp2019_2 217 -0.59
aspp2019_0 218 -0.94
oh, and the TU players are not performing at all? Wouldn't it be easier to interpret the results if instead of score one would show percent-win? Otherwise it is difficult to distinguish a bot who draws all the time versus a bot who either wins or loses with 50% probability. Also percent-win will then be more independent from the number of matches than score is...
(I forgot to add the TU players to the config)
Yeah, the output has a much bigger table with all this info (more or less). But it is 2d and needs to be shrunk :)
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━┳━━━━━━━━┳━━━━━━━━━┳━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━┓
┃ Name ┃ # Matches ┃ # Wins ┃ # Draws ┃ # Losses ┃ Score ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━╇━━━━━━━━╇━━━━━━━━━╇━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━┩
│ aspp2021_4 │ 262 │ 233 │ 6 │ 23 │ 0.8015267175572519 │
│ tube2024_0 │ 139 │ 118 │ 8 │ 13 │ 0.7553956834532374 │
│ bayes_avengers │ 139 │ 116 │ 8 │ 15 │ 0.7266187050359713 │
│ aspp2023_2 │ 248 │ 201 │ 8 │ 39 │ 0.6532258064516129 │
│ aspp2021_3 │ 251 │ 191 │ 3 │ 57 │ 0.5338645418326693 │
│ aspp2022_2 │ 266 │ 200 │ 1 │ 65 │ 0.5075187969924813 │
│ tube2024_1 │ 139 │ 99 │ 4 │ 36 │ 0.45323741007194246 │
│ shake_dat_botty │ 139 │ 95 │ 3 │ 41 │ 0.38848920863309355 │
│ aspp2021_1 │ 256 │ 174 │ 6 │ 76 │ 0.3828125 │
│ aspp2019_3 │ 257 │ 169 │ 3 │ 85 │ 0.32684824902723736 │
│ trilobots │ 138 │ 80 │ 7 │ 51 │ 0.21014492753623187 │
│ aspp2022_0 │ 251 │ 146 │ 8 │ 97 │ 0.1952191235059761 │
│ tube2024_3 │ 139 │ 80 │ 3 │ 56 │ 0.17266187050359713 │
│ aspp2023_4 │ 258 │ 131 │ 14 │ 113 │ 0.06976744186046512 │
│ too_bot_to_handle │ 140 │ 72 │ 2 │ 66 │ 0.04285714285714286 │
│ aspp2021_0 │ 242 │ 97 │ 10 │ 135 │ -0.15702479338842976 │
│ drbabydangers │ 138 │ 48 │ 18 │ 72 │ -0.17391304347826086 │
│ group4_2022_this_time_moving │ 138 │ 53 │ 6 │ 79 │ -0.18840579710144928 │
│ dogues_de_bordeaux │ 138 │ 43 │ 4 │ 91 │ -0.34782608695652173 │
│ aspp2021_2 │ 256 │ 68 │ 29 │ 159 │ -0.35546875 │
│ tube2024_2 │ 139 │ 41 │ 6 │ 92 │ -0.3669064748201439 │
│ aspp2019_1 │ 243 │ 35 │ 67 │ 141 │ -0.43621399176954734 │
│ aspp2022_1 │ 266 │ 53 │ 32 │ 181 │ -0.48120300751879697 │
│ aspp2022_4 │ 244 │ 29 │ 43 │ 172 │ -0.5860655737704918 │
│ aspp2019_4 │ 245 │ 48 │ 1 │ 196 │ -0.6040816326530613 │
│ aspp2022_3 │ 251 │ 44 │ 6 │ 201 │ -0.6254980079681275 │
│ aspp2019_2 │ 256 │ 35 │ 2 │ 219 │ -0.71875 │
│ aspp2019_0 │ 139 │ 2 │ 8 │ 129 │ -0.9136690647482014 │
└──────────────────────────────┴───────────┴────────┴─────────┴──────────┴──────────────────────┘
aspp2019_0 is definitely a little underwhelming? (They remove nodes with enemy bots from the graph and simply stop when this means that the graph is disconnected. I’m close to helping them out a little to perform better. :) )
don't you think it would make sense to change the logic of the CI to try to even out the number of matches played? If a good team enters the CI later, then the ranking is skewed towards the good teams that had a chance to play more matches and it will take a very long time to even out that effect. Or the raking should be based on percent-win.
On Fri 28 Jun, 08:58 +0000, Rike-Benjamin Schuppner @.***> wrote:
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━┳━━━━━━━━┳━━━━━━━━━┳━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━┓ ┃ Name ┃ # Matches ┃ # Wins ┃ # Draws ┃ # Losses ┃ Score ┃ ┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━╇━━━━━━━━╇━━━━━━━━━╇━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━┩ │ aspp2021_4 │ 262 │ 233 │ 6 │ 23 │ 0.8015267175572519 │ │ tube2024_0 │ 139 │ 118 │ 8 │ 13 │ 0.7553956834532374 │ │ bayes_avengers │ 139 │ 116 │ 8 │ 15 │ 0.7266187050359713 │ │ aspp2023_2 │ 248 │ 201 │ 8 │ 39 │ 0.6532258064516129 │ │ aspp2021_3 │ 251 │ 191 │ 3 │ 57 │ 0.5338645418326693 │ │ aspp2022_2 │ 266 │ 200 │ 1 │ 65 │ 0.5075187969924813 │ │ tube2024_1 │ 139 │ 99 │ 4 │ 36 │ 0.45323741007194246 │ │ shake_dat_botty │ 139 │ 95 │ 3 │ 41 │ 0.38848920863309355 │ │ aspp2021_1 │ 256 │ 174 │ 6 │ 76 │ 0.3828125 │ │ aspp2019_3 │ 257 │ 169 │ 3 │ 85 │ 0.32684824902723736 │ │ trilobots │ 138 │ 80 │ 7 │ 51 │ 0.21014492753623187 │ │ aspp2022_0 │ 251 │ 146 │ 8 │ 97 │ 0.1952191235059761 │ │ tube2024_3 │ 139 │ 80 │ 3 │ 56 │ 0.17266187050359713 │ │ aspp2023_4 │ 258 │ 131 │ 14 │ 113 │ 0.06976744186046512 │ │ too_bot_to_handle │ 140 │ 72 │ 2 │ 66 │ 0.04285714285714286 │ │ aspp2021_0 │ 242 │ 97 │ 10 │ 135 │ -0.15702479338842976 │ │ drbabydangers │ 138 │ 48 │ 18 │ 72 │ -0.17391304347826086 │ │ group4_2022_this_time_moving │ 138 │ 53 │ 6 │ 79 │ -0.18840579710144928 │ │ dogues_de_bordeaux │ 138 │ 43 │ 4 │ 91 │ -0.34782608695652173 │ │ aspp2021_2 │ 256 │ 68 │ 29 │ 159 │ -0.35546875 │ │ tube2024_2 │ 139 │ 41 │ 6 │ 92 │ -0.3669064748201439 │ │ aspp2019_1 │ 243 │ 35 │ 67 │ 141 │ -0.43621399176954734 │ │ aspp2022_1 │ 266 │ 53 │ 32 │ 181 │ -0.48120300751879697 │ │ aspp2022_4 │ 244 │ 29 │ 43 │ 172 │ -0.5860655737704918 │ │ aspp2019_4 │ 245 │ 48 │ 1 │ 196 │ -0.6040816326530613 │ │ aspp2022_3 │ 251 │ 44 │ 6 │ 201 │ -0.6254980079681275 │ │ aspp2019_2 │ 256 │ 35 │ 2 │ 219 │ -0.71875 │ │ aspp2019_0 │ 139 │ 2 │ 8 │ 129 │ -0.9136690647482014 │ └──────────────────────────────┴───────────┴────────┴─────────┴──────────┴──────────────────────┘
aspp2019_0 is definitely a little underwhelming? (They remove nodes with enemy bots from the graph and simply stop when this means that the graph is disconnected. I’m close to helping them out a little to perform better. :) )
— Reply to this email directly, view it on GitHub¹, or unsubscribe². You are receiving this because you commented.☘Message ID: @.***>
––––
¹ https://github.com/ASPP/pelita/issues/802#issuecomment-2197227058 ² https://github.com/notifications/unsubscribe-auth/AACUYC5GK4JMYADD773UADLZJWBYZAVCNFSM6AAAAABJ7QNSNSVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDCOJXGIZDOMBVHA
But the logic already does that. It just takes a while.
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━┳━━━━━━━━┳━━━━━━━━━┳━━━━━━━━━━┳━━━━━━━━┳━━━━━━┓
┃ Name ┃ # Matches ┃ # Wins ┃ # Draws ┃ # Losses ┃ Score ┃ ELO ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━╇━━━━━━━━╇━━━━━━━━━╇━━━━━━━━━━╇━━━━━━━━╇━━━━━━┩
│ aspp2021_4 │ 669 │ 569 │ 25 │ 75 │ 0.738 │ 1965 │
│ tube2024_0 │ 669 │ 563 │ 35 │ 71 │ 0.735 │ 1993 │
│ bayes_avengers │ 669 │ 561 │ 37 │ 71 │ 0.732 │ 1920 │
│ aspp2023_2 │ 669 │ 524 │ 22 │ 123 │ 0.599 │ 1839 │
│ tube2024_1 │ 668 │ 470 │ 27 │ 171 │ 0.448 │ 1858 │
│ aspp2021_3 │ 670 │ 479 │ 11 │ 180 │ 0.446 │ 1722 │
│ shake_dat_botty │ 670 │ 464 │ 12 │ 194 │ 0.403 │ 1680 │
│ aspp2022_2 │ 670 │ 457 │ 11 │ 202 │ 0.381 │ 1767 │
│ aspp2021_1 │ 670 │ 422 │ 10 │ 238 │ 0.275 │ 1645 │
│ aspp2019_3 │ 669 │ 418 │ 9 │ 242 │ 0.263 │ 1616 │
│ trilobots │ 668 │ 401 │ 22 │ 245 │ 0.234 │ 1697 │
│ aspp2022_0 │ 670 │ 398 │ 21 │ 251 │ 0.219 │ 1528 │
│ tube2024_3 │ 671 │ 400 │ 10 │ 261 │ 0.207 │ 1633 │
│ too_bot_to_handle │ 668 │ 359 │ 6 │ 303 │ 0.084 │ 1566 │
│ aspp2023_4 │ 669 │ 321 │ 25 │ 323 │ -0.003 │ 1415 │
│ group4_2022_this_time_moving │ 669 │ 304 │ 22 │ 343 │ -0.058 │ 1492 │
│ aspp2021_0 │ 669 │ 262 │ 19 │ 388 │ -0.188 │ 1362 │
│ drbabydangers │ 671 │ 222 │ 84 │ 365 │ -0.213 │ 1381 │
│ dogues_de_bordeaux │ 669 │ 254 │ 12 │ 403 │ -0.223 │ 1454 │
│ aspp2021_2 │ 669 │ 198 │ 47 │ 424 │ -0.338 │ 1327 │
│ tube2024_2 │ 668 │ 187 │ 35 │ 446 │ -0.388 │ 1272 │
│ aspp2019_1 │ 669 │ 103 │ 179 │ 387 │ -0.425 │ 1262 │
│ aspp2022_1 │ 669 │ 123 │ 97 │ 449 │ -0.487 │ 1228 │
│ aspp2022_4 │ 669 │ 81 │ 104 │ 484 │ -0.602 │ 1182 │
│ aspp2022_3 │ 669 │ 121 │ 14 │ 534 │ -0.617 │ 1147 │
│ aspp2019_4 │ 669 │ 120 │ 8 │ 541 │ -0.629 │ 1146 │
│ aspp2019_2 │ 670 │ 109 │ 7 │ 554 │ -0.664 │ 1128 │
│ aspp2019_0 │ 671 │ 10 │ 29 │ 632 │ -0.927 │ 778 │
└──────────────────────────────┴───────────┴────────┴─────────┴──────────┴────────┴──────┘
(Parallelisation is by the way a relative non-issue. Thanks to having a proper database we can just run a bunch of ci_engines at the same time.)
For reference, it is now possible to extract all games with errors from the database:
select * from games where json_extract(final_state, '$.num_errors') != '[0,0]' ;
The CI engine should take note when a bot fails with a fatal error to help with debugging. (Also the seed should be stored with the game info.)