Closed ctm closed 1 week ago
I've been working on this issue since I got up. I can't reproduce it locally, nor is there anything obvious in the code I've inspected. I have referred back to the image that I never took the time to copy into this issue and there's enough info there for me to be able to point the row in private_table_messages
that came from his reconnection:
select private_table_messages.id, received_at at time zone 'america/denver', substr(message::text, 0, 100), table_id, hand_id from private_table_messages join tables on table_id = tables.id where event_id = 5350 and player_id = 6 and hand_id >= 409764 and hand_id <= 409769 order by received_at;
id | timezone | substr | table_id | hand_id
----------+----------------------------+-----------------------------------------------------------------------------------------------------+----------+---------
13227618 | 2024-06-16 18:37:40.954996 | {"Preference": {"sound": 59, "seating": {"seats": [null, null, null, null, null, null, null, null, | 5894 | 409767
I don't, however, see anything in private_table_messages
out of the ordinary. I've removed the easy
label from this issue but will continue to poke around.
It looks like there are two issues. One is why did the window go away, the other is why did popping-in crash. My earlier exploration of this bug was focused on the former as well as what happens specifically when popping-in is done. I do not yet know what caused the window to go away and I don't see anything in the pop-in code that would cause a crash, however…
There is code inside log_line.rs
that assumes that the table knows the nick of each player who has chatted. However, I don't see an Identities
or DeerredIdentities
message being sent, so that made me look into whether there were any chat messages from a player who was no longer at the table, but there weren't:
mb2=> select public_table_messages.id, received_at at time zone 'america/denver', message, table_id, hand_id from public_table_messages join tables on table_id = tables.id where variant = 44 and event_id = 5350 and player_id is null and hand_id < 409767 order by received_at;
id | timezone | message | table_id | hand_id
----------+----------------------------+-------------------------------------------------------------------------------------------+----------+---------
11026618 | 2024-06-16 18:11:11.845335 | {"Chat": [13, "nh"]} | 5894 | 409737
11026934 | 2024-06-16 18:16:24.25937 | {"Chat": [13, "gg"]} | 5894 | 409746
11026935 | 2024-06-16 18:16:25.658577 | {"Chat": [10, "ggfn"]} | 5894 | 409746
11026963 | 2024-06-16 18:16:45.172101 | {"Chat": [10, "btw, you can re-enter twice in this one, just like yesterday evening's."]} | 5894 | 409747
11026980 | 2024-06-16 18:17:19.528546 | {"Chat": [10, "re-entries vary a fair amount between the WSOP-Style tournaments..."]} | 5894 | 409747
11027031 | 2024-06-16 18:18:41.826747 | {"Chat": [10, "ggfn"]} | 5894 | 409748
11027038 | 2024-06-16 18:18:45.15717 | {"Chat": [13, "gg"]} | 5894 | 409749
11027410 | 2024-06-16 18:26:45.510553 | {"Chat": [10, "ggfn"]} | 5894 | 409755
11027411 | 2024-06-16 18:26:46.261878 | {"Chat": [6, "gg"]} | 5894 | 409755
11027420 | 2024-06-16 18:26:52.364812 | {"Chat": [13, "gg]"]} | 5894 | 409756
11027557 | 2024-06-16 18:29:21.719148 | {"Chat": [13, "nh"]} | 5894 | 409758
11027570 | 2024-06-16 18:29:58.171624 | {"Chat": [20, "t igl"]} | 5894 | 409759
(12 rows)
mb2=> select id, nick from players where id in (13, 10, 6, 20) order by id;
id | nick
----+--------------
6 | smalltalkdan
10 | deadhead
13 | 🐭GamboMouse
20 | jrx
(4 rows)
FWIW, it's exceedingly poor form for the code to panic if a nick isn't known, so I will change that, even though it should never be hit, but I'll keep looking for a more likely cause of the panic.
I think I've found the errant code / data structure, although I haven't yet figured out how to trigger it:
...
if let Some(undocked_chat) = &mut self.undocked_chat {
undocked_chat.send_to_child(&Response::All(
self.log.as_ref().unwrap().data().to_vec(),
allow_chat,
on_me,
));
}
...
This code assumes that if we have an undocked chat that we have a log, which should be the case, but we can probably rejigger the data structures by either putting the log and undocked_chat inside the same Option
or putting the undocked chat's Option
inside the log's Option
. However, if for some reason we can't do that, calling unwrap()
is still unacceptable, especially since if there's no log, we can send an empty Vec
.
This is related to the fact that if a parent window sends a message to a child window before the child window has a listener set up, that message is dropped. So, when the table wants to transfer ownership of the log to the pop-up window, it has to hold onto the log until it receives a message from the pop-up letting it know that it's been received.
The errant code-snippet in my comment above is from the processing of SendAll
, which is only sent to the parent when the Chat
component is created. However, yew can create a component more than once and more than one window can send messages to the parent. In practice, I don't think yew does create the component more than once, nor should there be more than one child window, but something about the computer going to sleep probably broke one of those preconditions.
So, assuming that the unwrap()
is indeed the source of the panic, something caused the Table
component to receive a SendAll
after a MessagesDisplayed
. If that is due to more than one child, it might be possible to change the way we pass messages from "having the child send to the parent" to "having the child send to itself but have the parent listen". I don't know if that's actually allowed, so I'll probably have to write some test code to figure out.
I don't think having the child send messages to itself (and having the parent listen for them) is viable, even if it could be implemented, because the child would still need to send a message to the parent saying it's ready to receive. So, if we do anything, it makes more sense for the parent to send some sort of unique id down to the child in the first message and then for then to have the child send that id up with every message. Then we can detect, log and ignore messages from errant children, however, we only have a communication pathway to one child at a time, so I am not sure that there's any penalty, per-se, allowing messages from errant children as long as we get rid of the unwrap.
Fixed. Deploying now.
Fix whatever caused smalltalkdan's crash.
He sent me a screenshot, which has "Uncaught RuntimeError: unreachable in https://craftpoker.com/mb2-web.asm at line 1".
My guess is a combination of code inspection and sending my laptop to sleep will be sufficient to find and fix it.