Closed ghost closed 7 years ago
I can't reproduce this, whether compiling in Release mode or Debug.
I think it's more likely agoto()
bug.
If I could reproduce it, I'd edit mission 4.3 scene.txt to make the rogue bot selectable
and check if the program gets stuck on thegoto()
command.
This seems to reproduce it on any winged bot.
extern void object::New()
{
point pos = this.position;
goto(pos, pos.z, 1, 1);
}
Indeed it stuck on the goto() command:
Happens every time for me, and it's clearly goto() bug, it stuck on first checkpoint. It looks like changing
goto(circuit[i], circuit[i].z, 1,1);
to
goto(circuit[i], circuit[i].z, 0,1);
which cause bot to stop at every checkpoint fixes goto behavior. edit: I assume that this and issue #845 are both caused by broken checking if bot already reached destination point.
This and #845 are caused by goto doesn't reporting that it already reached destination. It looks like it occur only on 32-bit compilations, as every 64-bit compiler (Windows & Linux, Release & Debug) works fine.
I've done some investigation in there. It fails in this function, as long as dist > 10 everything is ok, but when we come close enough m_lastDistance variable suddenly reset itself to 1000 every frame, making one of conditions of if statement false, thus not aborting it. When robot finally gets far enough, everything goes back to normal, m_lastDistance is calculated correctly.. And again, when robot is close enough it suddenly reset. This variable only resets itself at line 745, but as far as I understand function Start should be called only at start of goto() function, but it's called every time we're close to goal. There is one more thing, it fix itself when we print dist variable to console. Other variables doesn't fix that. Does it prevent some optimizations by compiler?
I forgot I had this on my old laptop, seems to work ok.
--- a/src/object/task/taskgoto.cpp
+++ b/src/object/task/taskgoto.cpp
@@ -916,7 +916,10 @@ Error CTaskGoto::IsEnded()
if ( m_goalMode == TGG_EXPRESS )
{
dist = Math::DistanceProjected(m_goal, pos);
- if ( dist < 10.0f && dist > m_lastDistance )
+ float margin = 10.0f;
+ if ( m_object->Implements(ObjectInterfaceType::Flying) ) margin = 20.0f;
+
+ if ( dist < margin && dist > m_lastDistance )
{
return ERR_STOP;
}
Maybemargin
could be adjusted depending on current speed of the bot.
Well, that doesn't work for me. Maybe it fix other compilers behavior.
I've figured out marking dist variable as volatile seems to fix that, at least in my case.
I've applied both fixes, it shouldn't happen anymore.
Sounds like a memory corruption bug. I wouldn't trust this fix.
Technical info backfill: I was able to reproduce this issue when running the mission 4.3 in colobot-gold-0.1.10 downloaded from https://colobot.info/files/releases/0.1.10-alpha/colobot-gold-0.1.10-alpha-win32.zip under wine
Linux fedora 6.9.12-200.fc40.x86_64 #1 SMP PREEMPT_DYNAMIC Sat Jul 27 15:56:15 UTC 2024 x86_64 GNU/Linux
TODO: find the real root cause of the bug
Progress: I figured out how to compile the game in a way that reproduces this bug. It turned out that I needed to compile it in release mode
CXXFLAGS='-w' cmake -DCMAKE_BUILD_TYPE=Release
#include <limits>
to be able to compile thisHave you reproduced on more recent versions of the game?
More recent releases contain the "fix" that hides the problem by adding volatile
. I want to figure out the actual root cause of this because the bug may still be present. Ultimately I want to be able to remove the unnecessary volatile
There is a clue about the root cause in the diff I posted above (7 years ago). It was usually more reproducible on a very slow computer because the bot would pass the destination before checking if it was near the destination.
There is a clue about the root cause in the diff I posted above (7 years ago). It was usually more reproducible on a very slow computer because the bot would pass the destination before checking if it was near the destination.
Slow computer? It looks like there were two bugs.
I am troubleshooting the second bug
I've done some investigation in there. It fails in this function, as long as dist > 10 everything is ok, but when we come close enough m_lastDistance variable suddenly reset itself to 1000 every frame, making one of conditions of if statement false, thus not aborting it. When robot finally gets far enough, everything goes back to normal, m_lastDistance is calculated correctly.. And again, when robot is close enough it suddenly reset. This variable only resets itself at line 745, but as far as I understand function Start should be called only at start of goto() function, but it's called every time we're close to goal. There is one more thing, it fix itself when we print dist variable to console. Other variables doesn't fix that. Does it prevent some optimizations by compiler?
CTaskGoto::Start()
. Once I get close enough to the destination I start hitting the breakpoint every couple of frames. After each hit m_lastDistance
becomes 1000. It appears CTaskGoto::Start()
is re-executed for some reasonWaitForForegroundTask()
. When we get close to the destination, CTaskExecutorObjectImpl::EndedTask()
calls CTaskGoto::IsEnded()
and gets ERR_STOP
. This causes the task to end before CScriptFunctions::rGoto()
detects that it ended. CScriptFunctions::rGoto()
gets confused and starts the task again.I have stared into assembly and assembly stared back into me :squid: X__X :octopus:
What is supposed to happen:
CScriptFunctions::rGoto()
calls WaitForForegroundTask()
which calls CTaskGoto::IsEnded()
CTaskGoto::IsEnded()
returns ERR_STOP, we remove the foreground task and return from goto()CTaskExecutorObjectImpl::EndedTask()
calls CTaskGoto::IsEnded()
and gets ERR_CONTINUE
because the position of the bot stayed the same since the step 2What actually happens:
CScriptFunctions::rGoto()
calls WaitForForegroundTask()
which calls CTaskGoto::IsEnded()
dist > m_lastDistance
is false because we moved closer to the destinationCTaskGoto::IsEnded()
stores m_lastDistance = dist
CTaskGoto::IsEnded()
returns ERR_CONTINUECTaskExecutorObjectImpl::EndedTask()
calls CTaskGoto::IsEnded()
dist > m_lastDistance
to evaluate to false
because we expect m_lastDistance == dist
m_lastDistance
is a 32 bit float
, but dist
is an 80 bit long double
due to compiler optimizations:
For legacy x86 processors without SSE2 support, and for m68080 processors, GCC is only able to fully comply with IEEE 754 semantics for the IEEE double extended (long double) type. Operations on IEEE double precision and IEEE single precision values are performed using double extended precision.
dist > m_lastDistance
to evaluates to true because dist
is very slightly bigger than m_lastDistance
CTaskGoto::IsEnded()
returns ERR_STOP
CTaskExecutorObjectImpl::EndedTask()
ends the taskCScriptFunctions::rGoto()
thinks the task has not been started yet and starts a new taskHere is a godbolt link that demonstrates that sometimes a * a > a * a
can evaluate to true,l:'5',n:'1',o:'C%2B%2B+source+%231',t:'0')),k:35.758060860248435,l:'4',n:'0',o:'',s:0,t:'0'),(g:!((h:executor,i:(argsPanelShown:'1',compilationPanelShown:'0',compiler:g141,compilerName:'',compilerOutShown:'0',execArgs:'',execStdin:'',fontScale:14,fontUsePx:'0',j:1,lang:c%2B%2B,libs:!(),options:'-O2+-m32',overrides:!(),runtimeTools:!(),source:1,stdinPanelShown:'1',tree:0,wrap:'1'),l:'5',n:'0',o:'Executor+x86-64+gcc+14.1+(C%2B%2B,+Editor+%231)',t:'0')),header:(),k:31.30793615582513,l:'4',n:'0',o:'',s:0,t:'0'),(g:!((h:compiler,i:(compiler:g141,filters:(b:'0',binary:'1',binaryObject:'1',commentOnly:'0',debugCalls:'1',demangle:'0',directives:'0',execute:'1',intel:'1',libraryCode:'0',trim:'1',verboseDemangling:'0'),flagsViewOpen:'1',fontScale:14,fontUsePx:'0',j:1,lang:c%2B%2B,libs:!(),options:'-O2+-m32',overrides:!(),selection:(endColumn:1,endLineNumber:1,positionColumn:1,positionLineNumber:1,selectionStartColumn:1,selectionStartLineNumber:1,startColumn:1,startLineNumber:1),source:1),l:'5',n:'0',o:'+x86-64+gcc+14.1+(Editor+%231)',t:'0')),header:(),k:32.93400298392647,l:'4',n:'0',o:'',s:0,t:'0')),l:'2',n:'0',o:'',t:'0')),version:4)
This only affects 32 bit targets because on 64 bit targets gcc defaults to -mfpmath=sse
It was usually more reproducible on a very slow computer because the bot would pass the destination before checking if it was near the destination.
@melex750 Do you recall what observations led you to believe this? I don't think a slow computer can explain the problem.
In my observations increasing the margin to 20 meters hides the bug rather than fixing it: CTaskGoto
is stopped and restarted over and over until it gets lucky and survives long enough for CScriptFunctions::rGoto()
to detect that the task has ended
extern void object::SpeedDisplay()
{
object o = radar(WingedGrabber);
float maxv = 0;
while(true)
{
point p1 = o.position;
float t1 = abstime();
wait(1);
point p2 = o.position;
float t2 = abstime();
float v = distance2d(p1, p2) / (t2 - t1);
if(v > maxv) maxv = v;
message(maxv);
}
}
extern void object::WingedBot()
{
point p = position;
p.x += 300;
goto(p);
}
What is the closest B can be to C? I.e. what is the smallest distance that the bot has to fly in a single tick to overshoot the margin?
Let's consider three scenarios
1) B <= 0; C >= 10; the shortest possible distance between B and C is 10 meters. 2) A <= -10; C >= 10; given a constant-ish speed, B is about half way between the two; the shortest possible distance between B and C is about 10 meters. 3) A is inside the margin and B is too
abs(B)
were greater than abs(A)
, we would have detected that we are moving away and stopped before overshooting. Therefore abs(B) <= abs(A)
and A < 0<half the distance the bot flies in a single frame>
<the distance the bot flies in a single frame>
; C >= 10 therefore 1.5 * <the distance the bot flies in a single frame>
>= 10<the distance the bot flies in a single frame>
>= 10/1.5 == 6.6666...6.6666... < 10 therefore the answer is 6.6666...
I recall believing it might two different bugs like you mentioned previously. I actually found the overshooting behavior on a different level and reproduced it with a minimal CBot program (see second comment near the top from 8 years ago). I found when goto tested the distance to destination when it was just over the 10 meter margin it would overshoot seemingly before it checked again. I didn't really believe it happened because of a slow computer, but I couldn't reproduce it on a faster one.
Was it running at 2 frames per second? If not, I do not see how it could have possibly overshot the 10 meter margin
I'm not sure about the FPS, it was almost unplayable except with the graphics settings set to minimum and 800x600 resolution. It was my nephew's first computer and we were creating a userlevel together. I have it still in storage, maybe I'll dust it off and check it out.
Once I write a numerically stable fix you can test it.
Looking forward to it.
Math
I was confused. The margin was not 10 meters. It was 2.5 meters. I forgot that we use g_unit=4 to convert between in-game coordinates and in-game coordinates.
This seems to reproduce it on any winged bot.
extern void object::New() { point pos = this.position; goto(pos, pos.z, 1, 1); }
This is actually a different bug entirely and it was never fixed at all (even though this specific test case no longer reproduces it). The bug is caused by goto() telling the winged bot to fly forward and turn (instead of just turning) when it is facing more than 90 degrees away from the destination. This results in the bot orbiting the destination.
This test case reproduces the orbiting in gold-0.2.1:
extern void object::WingedBot()
{
point pos = position;
pos.x += 10;
turn(direction(pos));
pos = position;
pos.x -= 6;
message("goto(");
goto(pos, pos.z, 1, 1);
message(")");
}
The reason your test case no longer reproduces the bug is subtle:
In my observations increasing the margin to 20 meters hides the bug rather than fixing it:
CTaskGoto
is stopped and restarted over and over until it gets lucky and survives long enough forCScriptFunctions::rGoto()
to detect that the task has ended
This is incorrect. Here is the real cause and effect chain:
m_lastDistance = dist
rounds to nearest). 50% of the time it restarts CTaskGoto.1) The use of volatile
hides the bug by forcing dist
to be a 32 bit float instead of 80 bit long double
2) Increasing the margin to 5 meters hides the bug because 2.675 < 5
3) 64 bit platforms are immune because there gcc defaults to -mfpmath=sse
Possibly recent changes in CBOT engine broke something. The bot just grabs the BlackBox and fly to first checkpoint. Then just flies in circle instead flying through next checkpoints until his PowerCell run out of energy.
His script (data/ai/run1.txt) has never changed through GOLD development.