Open GoogleCodeExporter opened 9 years ago
r446 prevents the specific race condition mentioned here.
When a new line comes in before the login process has completed, the new line
is held for a bit, then sent back through again.
I'm not going to close this issue yet, as I suspect other race conditions may
be lurking. I want to see if I can find an example.
I'm not sure it will be possible to trigger the problem by simply pasting long
strings of commands. I think this fix will stop that from being a problem.
However, I think that with certain delays between commands there may be
problems.
I may need to create a test client that can send strings of commands at
specified intervals to trigger the problem.
(A test client that can send scripts of commands and check for expected output
would be good anyhow, so maybe testing for race conditions could be part of a
more general test setup).
Original comment by miss...@hotmail.com
on 31 Dec 2010 at 5:39
Thanks, the revision seems to stop all the issues I was having.
Here's an example where you see something that should be atomic but isn't: even
if everything works quite right, the output doesn't appear in order...
------ paste this, quickly: ------
@dig a to templo
.a
@destroy door
..................................
Expected output:
------
You dig a to templo.
templo
You speak a word of power and ...
------
What we get:
------
You dig a to templo.
templo
You speak a word of power and ...
A new exit door appears.
------
The message "A new exit door appears" is being sent to those who are in room
"templo" which is correct, but it should appear when the @dig is done. Since
I've already used that path to go to templo (.a), when I arrive there the door
should be there already, so I shouldn't arrive in that room in time to see the
message...
Original comment by marcos.m...@optimus.pt
on 31 Dec 2010 at 10:52
OK, I looked at the code and found out who's the culprit. This is here "by
design", but I don't understand why, and since this code exists since r2, I
don't have a way to get a clue about your intentions... Hoping you might help
me here ;-)
At rooms.py, the action function waits (asyncronously) 0.1 before actually
doing its work. This is on purpose, and documented:
"
Uses a callLater to make sure that reactions to the action occur
after the action has been reported.
"
What I don't understand is how can you achieve this by having a callLater, on
the contrary: if you want to have reactions to the action occur after the
action has been reported, then the operation you're doing should react first,
report after, right? In this case, the action is "report an action" (say to
everyone near this room that a new exit door appeared), but if you only say it
sometime *after* the door creation, instead of right away, then you aren't
making sure that reactions only occur after the the action was reported, in
fact you're opening window to let reactions actually occuring before the action
being done...
Anyway, my temp fix is taking that async timer out of the way, but I'm sure
that you have a good reason to have that timer there, so maybe the correct fix
is to have the possibility of flagging room.action to tell if we want it sync
or async...
Original comment by mindboos...@gmail.com
on 8 Nov 2011 at 7:53
Another action that isn't as atomic as it should: purge.
1) Create a user "a" with password "a";
2) login a a
3) in another terminal, connect to the MUD and paste this in the terminal:
purge a a
login a a
When "purge a a" ends executing the user "a" should be completely purged, but
it isn't, so when "login a a" happens, we still have an "a"... the result:
"
Connection purged.
Log in with "login <name> <password>"
You feel yourself being ripped away from where you are...
You have been teleported.
void
You do not see that here.
"
After this, you won't be able to type anything (since in the meantime the purge
really ends its effects).
From the log:
2011-11-09 19:53:36+0000 [TZ,1,127.0.0.1] player a logged in
2011-11-09 19:53:36+0000 [TZ,0,127.0.0.1] Lost a client!
As you can see, the "Lost a client!" only happens *after* the "player a logged
in"...
Original comment by mindboos...@gmail.com
on 9 Nov 2011 at 7:57
[deleted comment]
Here's a patch fixing this:
https://github.com/marado/tzmud/commit/4825c6620675ffecdc3dba599384ae3d4152de23
Twisted won't let us have a sync _purge, so I'm not sure if there any other
things that depend on the purge actually being complete other than login, but
if there are...
A safer solution would be to actually have a isLogged() function that doesn't
only check the value of the is_logged boolean... But, well, this works.
Original comment by mindboos...@gmail.com
on 9 Nov 2011 at 9:17
Probably I was just coding defensively. Other than follow, I can't find any
actions that require a delay, and follow has its own separate delay built in.
It looks like an action can specify a delay other than the default 0.1 seconds
if it wants to. It's possible that dig should specify no delay.
Maybe what we should do is have the default be no delay, but still keep the
code there that will allow an action to specify a delay if needed.
Original comment by miss...@hotmail.com
on 11 Nov 2011 at 6:17
Well... If we don't have any delay there (see my patch here:
https://github.com/marado/tzmud/commit/5bfe1644da16303fd053826a6c32ad01be77d7cc
), we can just delay the action call itself in the cases we need to, right?
Original comment by mindboos...@gmail.com
on 11 Nov 2011 at 8:05
Hmm...
Looking closer, I see that teleport is using the delay values:
room.action(dict(act='teleport_character_away',
delay=0.2,
actor=None,
character=self))
reactor.callLater(0.4, self.move, destination)
destination.action(dict(act='teleport_character_in',
delay=0.6,
actor=None,
character=self))
So, it is trying to order those events in to the proper sequence.
How is that holding up under the no delay patch?
Original comment by miss...@hotmail.com
on 11 Nov 2011 at 8:53
You're right, yes. But we can just reactor.callLater(0.4, room.action, dict)
and reactor.callLater(0.6, destination.action, dict) instead of calling those
actions now and telling them to only execute in 0.x ...
Original comment by mindboos...@gmail.com
on 12 Nov 2011 at 2:42
Right. I think the code path works out exactly the same.
I guess it is not as clear that that is an asynchronous call, but to my eye, it
looks cleaner with the delay= value in the dict.
Either way, I don't think the behavior in the face of a bunch of rapid calls
involving teleport is what would (or should) be expected ...
@teleport to house
@teleport
@teleport to house
@teleport
You feel yourself being ripped away from where you are...
You feel yourself being ripped away from where you are...
You feel yourself being ripped away from where you are...
You feel yourself being ripped away from where you are...
You have been teleported.
void
You have been teleported.
void
While someone watching sees:
lee waves his hands around mysteriously.
lee waves his hands around mysteriously.
lee waves his hands around mysteriously.
lee waves his hands around mysteriously.
lee disappears.
lee disappears.
lee disappears.
lee disappears.
lee appears.
lee appears.
Or even just:
lee appears.
lee appears.
In your work with other MUDS and talkers, have you seen any kind of testing
framework for this kind of thing?
I think before any major changes begin, it would be good to set up tests that
specify exactly what should happen given a particular sequence (and timing) of
commands.
Maybe like:
=w1 # logs in character wizard 1
=p1 # logs in character player 1
=p2
+p2 house # input from player 2
-w1 p2 leaves to the light. # output seen by w1
-p1 p2 leaves to the light.
-p2 house
1.5 # wait here for 1.5 seconds before sending next line
+w1 @teleport house
-p1 waves his hands around mysteriously.
-w1 You feel yourself being ripped away from where you are...
-p1 disappears.
-w1 house
-p2 w1 appears.
Original comment by miss...@hotmail.com
on 12 Nov 2011 at 2:19
> I guess it is not as clear that that is an asynchronous call, but to my eye,
it
> looks cleaner with the delay= value in the dict.
Either way works for me.
> Either way, I don't think the behavior in the face of a bunch of rapid calls
> involving teleport is what would (or should) be expected ...
You're right, it isn't. The problem is that in lots of stuff, including
@teleport, we shouldn't have anything asyncronous, at all. @teleport should:
1) on origin place, say that player "waved his hands"
2) remove character from origin place
3) tell player that he feels being ripped away
4) tell people on origin place that player has disappeared
5) put player on new place (I assume that this function triggers the player
seeing the place he now is)
6) say in new place "player x appears"
Each of this items must be done in this order, and must be syncronous - meaning
that each function will only return when all it is meant to do is done.
> In your work with other MUDS and talkers, have you seen any kind of testing
> framework for this kind of thing?
Nothing, sorry, most of the MUDs and talkers I've seen only work 100%
syncronously, some being only one non-threaded process...
Original comment by mindboos...@gmail.com
on 7 Dec 2011 at 7:11
OK, I restarted looking at this, and decided that I should investigate the
teleport issue in particular. Threading through the code, I realized this:
==========
@teleport house
You feel yourself being ripped away from where you are...
@teleport
You feel yourself being ripped away from where you are...
@teleport
You feel yourself being ripped away from where you are...
You have been teleported.
void
==========
Which means that:
1) When you're teleported to somewhere else, you have say "You feel yourself
being ripped away from where you are...";
2) When you're teleported to the same place you already were, you also see
(afterwards), "You have been teleported." and the place where you were
teleported to (which is always where you were already).
I have a couple of considerations here already:
a) do we really want that it is possible to teleport something to where it
already is? I'm OK with either choice, but it makes more sense to me if you
simply can't teleport to where you already are...;
b) We obviously shouldn't have a different behaviour, so the question is:
should we display "You have been teleported." and where to... always or never?
Once again, I'm OK with either choice, but it makes more sense to me if we
always display that.
Now, I went to investigate *why* we were showing that info sometimes, and some
other times we weren't. Seems that the difference is in the function
"near_teleport_character_in", which is meant to be a function to warn those who
are in some place that they suddenly started seeing something because it was
teleported next to the player. IMHO someone who is teleported to where it was
an instant ago doesn't see anything, even they feel something. So, I think we
should say "You feel yourself being ripped away from where you are..." but it
doesn't make sense saying "You have been teleported" (at least not here in the
code, where the sentence actually means "you see yourself appearing here
because you have been teleported", which makes no sense) - and since you're in
the same place it also doesn't make sense that you decide to look around.
Here's a patch fixing this, please push it to your own code if you agree with
my reasoning:
https://github.com/marado/tzmud/commit/a952f0da1318fd22750b4fd3f5cdab067909daba
Original comment by mindboos...@gmail.com
on 9 Mar 2012 at 3:00
Back - sorry for the delay in giving you a complete reply. To answer your
question:
"How is that holding up under the no delay patch?"
The answer is: applying the no delay patch *and* the one in c13, the no delay
patch still wasn't enough. A quick look was enough to find out why: between
those two actions with delay (that the "no delay" patch makes synchronous)
there's this call:
reactor.callLater(0.4, self.move, destination)
which is obviously an async call.
Replacing it with "self.move(destination)" fixes the @teleport issues.
So - as far as I am concerned - the issues on *this* bug report are fixed with
the two aforementioned patches *and* this replacement.
Now, if we do that, we should create another issue named "validate all
callLater calls to see if we really want them async". A quick grep shows things
that surely are meant to be async (database doing pack_regularly, !shutdown
with delay), but others are probably like teleport, so if we go this way
there's that extra work needed.
If you're OK with this solution, I am offering to do the remaining work (create
that issue, validate each callLater, produce needed patches to fix this once
for all).
Original comment by mindboos...@gmail.com
on 11 Apr 2012 at 7:02
Original issue reported on code.google.com by
miss...@hotmail.com
on 30 Dec 2010 at 4:03