beyond-all-reason / spring

A powerful free cross-platform RTS game engine
https://beyond-all-reason.github.io/spring/
Other
219 stars 101 forks source link

Problem when AI vs AI on high game speed #269

Closed neoedmund closed 2 years ago

neoedmund commented 2 years ago

I wrote an AI, it works as expected in normal game speed. Game setting is my AI vs Barb AI, I as spectator. I can adjust the game speed while watching them playing.

but when I adjust to higher speed. My AI act differently.

see the log from my AI on letting armlab to build 1 armck

normal speed

 [f 2154 ]factory_build [ armlab(27239) -> armck ]
 [f 2169 ]unit created: team=0 id=28944 name=armck
 [f 3257 ]unit finished: team=0 id=28944 name=armck

speed x20

 [f 2979 ] factory_build [ armlab(20797) -> armck ] cmd cnt 0
 [f 2984 ] factory_build [ armlab(20797) -> armck ] cmd cnt 0
 [f 2989 ] factory_build [ armlab(20797) -> armck ] cmd cnt 0
 [f 2994 ] factory_build [ armlab(20797) -> armck ] cmd cnt 0
 [f 2999 ] factory_build [ armlab(20797) -> armck ] cmd cnt 0
 [f 3004 ] factory_build [ armlab(20797) -> armck ] cmd cnt 0
 [f 3020 ] unit created: team=0 id=30192 name=armck hp=0.100000,builder cmd cnt:2
 [f 4072 ] unit finished: team=0 id=30192 name=armck
 [f 4095 ] unit created: team=0 id=9555 name=armck hp=0.100000,builder cmd cnt:5
 [f 4458 ] unit finished: team=0 id=9555 name=armck
 [f 4481 ] unit created: team=0 id=30940 name=armck hp=0.100000,builder cmd cnt:4
 [f 5735 ] unit finished: team=0 id=30940 name=armck
 [f 5755 ] unit created: team=0 id=10280 name=armck hp=0.100000,builder cmd cnt:3
 [f 7160 ] unit finished: team=0 id=10280 name=armck
 [f 7182 ] unit created: team=0 id=31108 name=armck hp=0.100000,builder cmd cnt:2
 [f 8227 ] unit finished: team=0 id=31108 name=armck
 [f 8249 ] unit created: team=0 id=25940 name=armck hp=0.100000,builder cmd cnt:1
 [f 8822 ] unit finished: team=0 id=25940 name=armck

I use unit.currentCommands.size()==0 to test if the builder is idle, If yes, I will issue build command(to build armck for example ). But it seems lagged, so on every testing, my AI did not find any armck and the armlab is idle, so let it build, result to build multiple `armck', actually I just want to build one.

Localconnect is used , no UDP stuff is initialized.

How can this happen ? And how to make things synchronized ?

neoedmund commented 2 years ago

My AI's update() are all in sequence ,no overlap on time. It take the engine more frames to make thing happen in high speed than normal speed. so sound's like engine's bug.

Beherith commented 2 years ago

Update() is not guaranteed to be called every simframe for widgets (i think its only called on drawframes). If this is a widget AI, then use GameFrame().

See sim-draw frame order with /debugview

neoedmund commented 2 years ago

I'm using java AI.

Beherith commented 2 years ago

Ah I see, im unfortunately not familiar with the AI interfaces outside of Lua. However, can you print-verify the gameframe on each Update() call?

neoedmund commented 2 years ago

log.zip This log file shows frame number and timestamp of frame's begin and end, and armck unit building events mentioned.

Beherith commented 2 years ago

@rlcevg is the master of the AI interface, he might have an idea as to what is going on.

Also, I may have misunderstood your error: So your error is that you told the armlab to build a ck, and the respective unit.currentCommands.size()==0 did not result in a non-empty build queue, as you expected?

neoedmund commented 2 years ago

wow , I fixed the problem .

like

https://github.com/neoedmund/spring/commit/920a52964fd55353b824b7f4571a3fa184a191fb

(Someone please make a real commit based on it)

Explanations :

Clients update world according to event messages from server. Creating multiplied serverframes at once, means the next frames the clients get no events but passed. so the AI find world "freezing"(known commands still processed(like missiles flies takes time), but no more new events come in, which should be, thus a bug) but if it still issue commands (judged upon wrong states) will makes things worse .

such speedup made by numNewFrames>1 is just fake game speed , if you set game speed to x1000, but max speed the hardware can achieved is x50, then most of frames are skipped wrongly.

there should be no good to do so in any case .

it also fix the bug of recording demo in such condition which does not replay correctly .

With this patch, AI vs AI can be set to max speed your PC can offer correctly.

although headless mode is broken regardless of this patch . Maybe I'll look into that headless problem later.

lhog commented 2 years ago

@marcushutchings can you have a look at the proposed fix please?

Beherith commented 2 years ago

Headless BAR was fixed recently (opengl gadget error)

neoedmund commented 2 years ago

Headless BAR was fixed recently (opengl gadget error)

recent change should be unrelated. AI's getResourceMapSpotsPositions() returns null

marcushutchings commented 2 years ago

@marcushutchings can you have a look at the proposed fix please?

Sure, I'll look into this proposal.

marcushutchings commented 2 years ago

Okay, let me explain, when you issue an order to a unit, it has to be submitted to the server. The server will add the command to whatever it consideres the current sim frame, and when the server broadcasts the new frame - the command is sent out for that specific frame. The client can only act on unit orders once the server has scheduled them, otherwise we would being getting desyncs. If the client has fallen behind the server, then that will result in a delay in the command being processed as far as the client can tell. The server has dynamic code to moderate the frequency of sim frames. The generation of extra frames was put in place to allow network games to run smoothly.

Now changing server code like this may create the apperance that the problem is solved when testing on a local game, but all that has done is hidden the problem when the AI gets invovled in a network game. Fundamentally the problem here is that you expect the command to be processed immediately and consulting a value which can, for very valid reasons, be delayed. I would suggest trying an alternative approach to the problem with the AI. Perhaps evaluate why it is so important that the factory has nothing queued; especially given that queues are available and easily used. Maybe worth researching other AI implemetations to get some ideas. We have three other AIs in BAR that may provide some ideas.

marcushutchings commented 2 years ago

I believe an event is raised when a unit is built. Maybe you can catch that event instead of polling the order queue count?

neoedmund commented 2 years ago

I cannot understand every word you explained. also I cannot counter it because some detail is still unclear to me .

but lets examine some case we cannot bypass .

say ,

At frame N, missile A is running at tank B at position B1, B need to evacuate. so B run away at frame N issuing command M1. after 10 frames , regardless what server did, on every client , the missile will be resolved as hit at B1. but if M1 not be treated by server to arrange at frame N or even N+1. but arranged at N+10, because of the inserted empty frames. how can AI do micro? what's the point of it ? I don't understand how it solved network latency , but seeing make the mathematical model failed .

I believe an event is raised when a unit is built. Maybe you can catch that event instead of polling the order queue count? of cause AI issued the command ,without event ,AI know it . but AI cannot tell if it is accepted or discard or invalidated. also if you mean event from server , there is no such events , In this case even should be unit created ,no .

also network multiplayer scenery maybe different from local play . but local play need to be fixed, with the old mechanism I cannot see it can do things right . also we need a determinable logic , make AI act same on different game speed . eg. at frame N1 make A ,at frame N2 make B. server should do what is asked for. not depend on the latency or game speed . server decide to frame N1+x make A ,at frame N2+y make B. this make everything unpredictable .

marcushutchings commented 2 years ago

I appreciate it is frustrating. I know what you want to achieve. Ultimately you would have to make a modified version of the Spring engine that would either run only single player or used a completely different network model (which would not be able to handle 100's of units real time.) That would be the only way to guarantee the behaviour you're looking for.

how can AI do micro?

The game's base sim rate is 30hz, so a 10 frame delay is about 333ms. Not great, but not a huge issue either - players can micro with similar delays so an AI can too. A few frames isn't an issue, but I agree that it does become an issue with big delays. This the same issue for real players, I don't think players can can play exactly the same if the game speed is increased or if their game lags badly. Anyhow, the point is in a network game you cannot avoid some delay, and what happens if the client running your AI falls behind the other clients? It will experience increased lag as it falls behind in processing frames. Expecting 1-sim frame response times in every scenario for commands is not going to end well for the AI.

Example of a good microing AI is BarB. It has to deal with the exact same uncertainties as your AI.

what's the point of it ? I don't understand how it solved network latency , but seeing make the mathematical model failed .

Network smoothing isn't necessarily about low latency, in fact quite the opposite. It is about ensuring that gameplay continues to tick over at steady rate despite variations in network traffic latency.

also network multiplayer scenery maybe different from local play . but local play need to be fixed, with the old mechanism I cannot see it can do things right .

The engine is designed single mode (i.e. local play is a variation of networked play) to reduce complexity and effort in maintainability. Making local play fundamentally behave differently will increase effort needed for engine, AI and game deverlopers to maintain their projects. In short, having local and network games behave differently will result in an increase in bugs. It is best to think of Spring as a network-first engine.

The best person to talk with about AI in the Spring would be @rlcevg

neoedmund commented 2 years ago

Okay, you talked about your understanding about the methodology. I'm not against it at all. but that is away from the source. https://github.com/beyond-all-reason/spring/blame/BAR105/rts/Net/GameServer.cpp#L2621 in normal speed game , how numNewFrames works. I made a guess in 100 played games, 20000 frames per game. the case numNewFrames>1 maybe below 100 ( 1/20000 chance) but in fast game speed, I say fast is not x1.2 or x1.5 but x100, even x1000. what happened to numNewFrames? I think no one debug into the code like me since it is last modified at least 9 years ago. In the scenario of x1000, of course we don't want to game play exactly at x1000, letting a 20 minutes game finish in 1.2 seconds. The requirement's bottleneck changed to "play at maximal at x1000, and let's see how fast you can achieve ". And here comes my fix. I'm also sure my fix won't hurt normal human plays at speed below x2.

I dont suppose persuading people being more difficult than bugfix, but I really wish you understand my point.

neoedmund commented 2 years ago

The requirement "play at maximal at x1000, and let's see how fast you can achieve " is quite essential on AI develop , especial machine learning ones need to be trained lots of games as quick as possible . I'm curious how you test Barb AI when develop it . do you fast the game speed when test it ? I also find some old post saying fast speed making some AI act differently . or Barb employ some special mechanism even in native engine support to cover the timing problem in high speed gaming?

neoedmund commented 2 years ago

Headless BAR was fixed recently (opengl gadget error)

Oh, I realized you are saying BA not https://github.com/beyond-all-reason/spring
Headless is fix by BA.

marcushutchings commented 2 years ago

I dont suppose persuading people being more difficult than bugfix, but I really wish you understand my point.

I have been framing my responses around the original issue: the false assumption that commands can always be processed within one simulation frame.

However, there is a separate issue, where you get a different outcome from the AI if the match speed is changed. That is the issue it seems you are focusing your attention on. The challenge that needs to be addressed with the proposed fix is the possible negative impact on network games. If there is sufficient evidence that network games will not be negatively impacted by this change then we can look to getting it implemented in the engine.

neoedmund commented 2 years ago

well, maybe you did not see any necessity in this change , because you have Barb AI which rocks.
about the negative impact evidences , if you really cares (probability not), you can make a counter of numNewFrames, which can be print to logger file after every match on next release , observe if it is a remarkable value. if anyone have similar need can always come to this hint. thanks for the open source . I'd like to keep it to my own branch at this time . and thank you all for looking into the problem for me .

neoedmund commented 2 years ago

discussion can move to https://github.com/beyond-all-reason/spring/discussions/274

neoedmund commented 2 years ago

numNewFrames>1 only happens when server lagged. But when server lags , it's better accept it. numNewFrames make clients "motion sick" and should not ever has such logic.

neoedmund commented 2 years ago

other facts, If AI use too much time, server will wait, slowing down the speed. the server is not lag, because it just process packet, not likely to lag. then numNewFrames not > 1

only when I adjust speed to x1000, the server will found itself lag. maybe the creator of the code doing numNewFrames>1 by wrong purpose.

marcushutchings commented 2 years ago

The original change makes sense. If the host has a cpu load spike, then this can smooth out the experience for other connected clients. If the clients wait on a server that stutters, then the experience is poor for all the clients. However, if the host is the only player and there are no spectators, then adding extra frames makes little sense - so in that specific case we could keep new frames to 1.

neoedmund commented 2 years ago

Server is to relay packets, if it has cpu load lag. Nothing can improve the client experience. There is no client->client(without server relay) packets transmission, am I right?

rlcevg commented 2 years ago

Noticed my nick was mentioned few times so a few words:

BARbarIAn started as my "Hello world!" in C++ learning. Hence people investigating it shouldn't expect state-of-art code. I didn't dive into engine internals deeply, except when fixing obvious bugs. It's event driven AI and doesn't expect its commands to be 100% executed (polling for error states and timeouts included). A way to multi-thread AI is to queue engine events inside AI, but you'll probably have to re-implement many engine functions like blockmaps, pathing, etc. as native AI API (Java, C++) is not thread-safe.

As unsynced AI by definition I suspect BARb also suffers issues with micro at x1000 speed - its threaded pathfinder, threat manager and terrain analyzer won't run faster but events will come with greater frequency. Yet BARb is stable enough for me at x10 to not worry about it.

If we are trying here to make unsynced AIs run synced at x1000, then i'm not much of help. Fun that we have another legit movement to make synced luaAIs - unsynced. Some experience from web+python development: instead of running 1 instance at x1000 run 1000 instances at x1 (just kidding).

My opinion: making spring faster and better is very much welcome, just make sure new changes won't desync in multiplayer and won't introduce performance regressions.

neoedmund commented 2 years ago

native AI API (Java, C++) is not thread-safe.

Thank you for the related information. A question please, according to the no thread-safe , does different AI agents thread safe? eg. on local machine a game AI(barb) vs AI(simpleAI), they both run on the same host. their commands all go to same game instance, they should be in different thread I guess, will they interference with each other? Maybe there are called update() sequentially , so if they do not call engine outside update(), they should be safe. ?

rlcevg commented 2 years ago

I'll omit example BARb vs SimpleAI as it's more complex than BARb1 vs BARb2: BARb is unsynced native AI uses interface designed for C based engine-side AIs; and SimpleAI is synced luaAI, basically gadget and part of lua game (which probably has even more restrictions on threading).

Maybe there are called update() sequentially , so if they do not call engine outside update(), they should be safe. ?

yes, but no way to ensure it (unless rewriting spring functions thread-safe). What can be done: store engine event in thread-safe queue in SkirmishAI.dll and run own update() without touching engine API in separate thread, then store intermediate results of threaded update() in another queue to execute single-threaded spring functions in spring's update(). Though I think creation of threads in SkirmishAI.dll outside of engine could appear bad practice: engine should provide interface to re-use its worker threads. But described solution will work now without re-writing spring.

lhog commented 2 years ago

As @marcushutchings explained the proposed change is not going to happen.