Open GoogleCodeExporter opened 8 years ago
I had such a problem with my lua submission.
I uploaded c++ lua interpreter & lua bot.
It worked fine with tcp server, but failed on the first turn every game without
any error message.
The first time it appeared to be my bug - I called "MyBot.lua" from the c++,
though the actual file name was "mybot.lua", so my c++ interpreter failed to
find the script and terminated.
The second time all games worked fine except for worker=25 computer - it failed
every time I played a game on that server.
So I reuploaded my submission so that it worked fine.
Though my the most actual problem with bot games are still there - unexpected
loss without an error message though my bot seemed to win, like this:
http://ai-contest.com/profile.php?user_id=9336
Original comment by buratin....@gmail.com
on 15 Oct 2010 at 5:56
I have the same problem with my last submission (in C#)
- all the game lost in the first turn ...and no error message displayed in the game window.
My bot it's been tested here on dozens of maps ...and i don't get any error on
the first turn as well as later
http://www.ai-contest.com/visualizer.php?game_id=5900455
http://www.ai-contest.com/visualizer.php?game_id=5899939
http://www.ai-contest.com/visualizer.php?game_id=5896566
http://www.ai-contest.com/visualizer.php?game_id=5896566
Original comment by parvuval...@gmail.com
on 15 Oct 2010 at 10:25
[deleted comment]
Issue happens even with StarterPackage (csharp_starter_package.zip)
http://ai-contest.com/visualizer.php?game_id=5960264
My guess: environment startup is too slow: it not always can start program in 3
seconds.
Possible solution: Give some time to bot After startup and Before sending
commands. This doesn't need changes in rules because time for first turn still
will be 3 seconds.
Original comment by dmitrisc...@gmail.com
on 19 Oct 2010 at 3:35
Could it be "not enough memory" issue ?
Original comment by buratin....@gmail.com
on 19 Oct 2010 at 9:51
I think the problem is a simple CPU scheduling problem. It affects all bots
regardless of language. If I understand correctly, I think the game server
works like this:
1. Start bot 1 as a new process
2. Start bot 2 as a new process
3. Send the game state to both bots using a blocking write
4. Until the time limit is reached, poll each bot using a non-blocking read
Now how much actual CPU time does each bot get? It depends entirely on process
scheduling, which is not under the control of the game engine. I think
everything is running inside a virtual server instance, so if for some reason
both bots end up on the same "virtual CPU", they could end up competing for
time slices. If there are other processes besides the game engine and the bots
running, the situation is even worse.
I suggest the game engine should be modified so that only one bot is active at
a time. That gives the active bot the best possible chance that it will
actually get its fair share of CPU time. It would work like this:
1. Start bot 1 as a new process
2. Start bot 2 as a new process
3. Send the game state to bot 1
4. Poll bot 1 for its moves. Save them, but do not modify the game state yet
5. Send the game state to bot 2
6. Poll bot 2 for its moves. Save them, but do not modify the game state yet
7. Update the game state with the moves read during the turn
Even though bot 1 moves first, bot 2 doesn't see what bot 1 did, so there is no
gameplay advantage to the turn scheduling. Assuming that bot 2 is blocked
waiting for the game state while bot 1 is moving, the bots will not end up
competing for CPU time.
A disadvantage to this arrangement is that spare CPU cycles may go unused, and
games may take longer on average to play. However, consider that people may
start deciding to re-submit their bot each time they lose due to bad scheduling
luck. That too will create churn on the game server. I think it is better to
play the game more fairly, even if that means playing it a little more slowly.
Original comment by jklan...@gmail.com
on 22 Oct 2010 at 9:35
I am troubleshooting the user sandbox code, and it looks like there is a
serious problem with SSH input buffering. The tournament manager communicates
with the bots over SSH, but SSH is holding on to the input in its buffer. It
never reaches the bot.
Original comment by jklan...@gmail.com
on 24 Oct 2010 at 8:19
Just in case you haven't noticed, ssh is only used on the main server. Of
course it's still a problem if it's not working correctly.
Original comment by janzert
on 24 Oct 2010 at 9:49
Sorry folks, what I thought was an SSH buffering issue was actually a buffering
issue elsewhere, and it was not on the production code, it was on a piece of
code that I had modified. To make a long story short, in Python you _must_ use
file.readline(), you _cannot_ do file.next(), if you want to read just one line
from a pipe.
Original comment by jklan...@gmail.com
on 25 Oct 2010 at 2:55
I can't be 100% sure, but after observing 22 games, looking at game_info.php to
see on which worker they were played, it seems for my bot there's a two way
implication: game played on worker 0 <-> game lost in first turn. I.e. all
games played on worker 0 failed to start, all games on other workers played
fine. http://ai-contest.com/profile.php?user_id=9786
buratin.barabanus also indicates that he did not receive any error message on
the failed games. This also points to worker 0 - which did not have error
reporting at that time.
Original comment by tjverw...@gmail.com
on 25 Oct 2010 at 5:02
For me and some other people on the forums, the bad worker was 55. Perhaps
it's not so much that there is a bad machine, but that a machine gets in a
certain state and then it starts a failing streak. So far I have run over a
hundred games locally using the exact same game engine as is used in the cloud,
and I have yet to reproduce the behavior. I even tried re-testing some of the
first-turn-failed maps using my bot playing against itself or other bots. No
timeouts.
I talked with janzert about server load the other day, and he said that load
averages on the game servers are not particularly high. Having thought about
it further, those averages are per minute I think. So even if the load average
for a minute is low, it is still possible that during the second which a bot
gets to move, the CPU is loaded and the bot can't run. You can slam a core for
several seconds and still have the overall average load be very low.
I am working on an experimental version of the Python engine which avoids
polling and therefore uses less CPU time than the one used in production. For
some reason it uses more overall wallclock time to run games. I don't
understand what is going on. Below are some example times for the same game.
Perhaps they don't mean what I think they mean.
Production Engine:
real 0m3.875s
user 0m3.280s
sys 0m0.310s
Experimental Engine:
real 0m4.549s
user 0m0.230s
sys 0m0.080s
Original comment by jklan...@gmail.com
on 25 Oct 2010 at 5:30
I hope that will improve things. Btw, it was my understanding that worker 0,
the main server starts games / bots a bit differently from the others. Perhaps
it is just slower in starting everything up and my bot really does time out,
but only on worker 0?
Here's an update on my data:
My bot now played 40 games, 12 games failed to start.
All games on worker 0 failed (11x) + 1 game failed on worker 64.
Can't my bot unsubscribe from worker 0? :-P
Original comment by tjverw...@gmail.com
on 26 Oct 2010 at 8:42
Attached is an experimental new game engine. It offers:
* approximately double the overall game throughput
* correct detection of multiple different types of bot errors
* pluggable I/O multiplexors
* a full test suite covering failure cases as well as compatibility with the
existing engine
* a compatible API to the existing engine (play_game())
The README.txt file contains more information. The README file also discusses
two possible security issues in the existing code, so even if it is unlikely
that a new engine will be adopted at this time, at least those issues should be
considered.
Original comment by jklan...@gmail.com
on 28 Oct 2010 at 12:25
Attachments:
I discovered that the library I'm using tries to create a Java Thread. Since I
don't need multi threading, I refactored the code and removed that
dependency/requirement. My bot seems to be working on worker 0 now.
Original comment by tjverw...@gmail.com
on 28 Oct 2010 at 11:27
Original issue reported on code.google.com by
lord.loc...@gmail.com
on 13 Oct 2010 at 5:28