ctm / mb2-doc

Mb2, poker software
https://devctm.com
7 stars 2 forks source link

Popping-in a window can panic #1443

Closed ctm closed 1 week ago

ctm commented 2 weeks ago

Fix whatever caused smalltalkdan's crash.

During yesterday’s tournament, my computer went to sleep. When I woke it up, the detached chat window was gone. When I attempted to pop it back in I got the following.

He sent me a screenshot, which has "Uncaught RuntimeError: unreachable in https://craftpoker.com/mb2-web.asm at line 1".

My guess is a combination of code inspection and sending my laptop to sleep will be sufficient to find and fix it.

image

ctm commented 1 week ago

I've been working on this issue since I got up. I can't reproduce it locally, nor is there anything obvious in the code I've inspected. I have referred back to the image that I never took the time to copy into this issue and there's enough info there for me to be able to point the row in private_table_messages that came from his reconnection:

select private_table_messages.id, received_at at time zone 'america/denver', substr(message::text, 0, 100), table_id, hand_id from private_table_messages join tables on table_id = tables.id where event_id = 5350 and player_id = 6 and hand_id >= 409764 and hand_id <= 409769 order by received_at;
    id    |          timezone          |                                               substr                                                | table_id | hand_id 
----------+----------------------------+-----------------------------------------------------------------------------------------------------+----------+---------
 13227618 | 2024-06-16 18:37:40.954996 | {"Preference": {"sound": 59, "seating": {"seats": [null, null, null, null, null, null, null, null,  |     5894 |  409767

I don't, however, see anything in private_table_messages out of the ordinary. I've removed the easy label from this issue but will continue to poke around.

ctm commented 1 week ago

It looks like there are two issues. One is why did the window go away, the other is why did popping-in crash. My earlier exploration of this bug was focused on the former as well as what happens specifically when popping-in is done. I do not yet know what caused the window to go away and I don't see anything in the pop-in code that would cause a crash, however…

There is code inside log_line.rs that assumes that the table knows the nick of each player who has chatted. However, I don't see an Identities or DeerredIdentities message being sent, so that made me look into whether there were any chat messages from a player who was no longer at the table, but there weren't:

mb2=> select public_table_messages.id, received_at at time zone 'america/denver', message, table_id, hand_id from public_table_messages join tables on table_id = tables.id where variant = 44 and event_id = 5350 and player_id is null and hand_id < 409767 order by received_at;
    id    |          timezone          |                                          message                                          | table_id | hand_id 
----------+----------------------------+-------------------------------------------------------------------------------------------+----------+---------
 11026618 | 2024-06-16 18:11:11.845335 | {"Chat": [13, "nh"]}                                                                      |     5894 |  409737
 11026934 | 2024-06-16 18:16:24.25937  | {"Chat": [13, "gg"]}                                                                      |     5894 |  409746
 11026935 | 2024-06-16 18:16:25.658577 | {"Chat": [10, "ggfn"]}                                                                    |     5894 |  409746
 11026963 | 2024-06-16 18:16:45.172101 | {"Chat": [10, "btw, you can re-enter twice in this one, just like yesterday evening's."]} |     5894 |  409747
 11026980 | 2024-06-16 18:17:19.528546 | {"Chat": [10, "re-entries vary a fair amount between the WSOP-Style tournaments..."]}     |     5894 |  409747
 11027031 | 2024-06-16 18:18:41.826747 | {"Chat": [10, "ggfn"]}                                                                    |     5894 |  409748
 11027038 | 2024-06-16 18:18:45.15717  | {"Chat": [13, "gg"]}                                                                      |     5894 |  409749
 11027410 | 2024-06-16 18:26:45.510553 | {"Chat": [10, "ggfn"]}                                                                    |     5894 |  409755
 11027411 | 2024-06-16 18:26:46.261878 | {"Chat": [6, "gg"]}                                                                       |     5894 |  409755
 11027420 | 2024-06-16 18:26:52.364812 | {"Chat": [13, "gg]"]}                                                                     |     5894 |  409756
 11027557 | 2024-06-16 18:29:21.719148 | {"Chat": [13, "nh"]}                                                                      |     5894 |  409758
 11027570 | 2024-06-16 18:29:58.171624 | {"Chat": [20, "t igl"]}                                                                   |     5894 |  409759
(12 rows)

mb2=> select id, nick from players where id in (13, 10, 6, 20) order by id;
 id |     nick     
----+--------------
  6 | smalltalkdan
 10 | deadhead
 13 | 🐭GamboMouse
 20 | jrx
(4 rows)

FWIW, it's exceedingly poor form for the code to panic if a nick isn't known, so I will change that, even though it should never be hit, but I'll keep looking for a more likely cause of the panic.

ctm commented 1 week ago

I think I've found the errant code / data structure, although I haven't yet figured out how to trigger it:

...
                if let Some(undocked_chat) = &mut self.undocked_chat {
                    undocked_chat.send_to_child(&Response::All(
                        self.log.as_ref().unwrap().data().to_vec(),
                        allow_chat,
                        on_me,
                    ));
                }
...

This code assumes that if we have an undocked chat that we have a log, which should be the case, but we can probably rejigger the data structures by either putting the log and undocked_chat inside the same Option or putting the undocked chat's Option inside the log's Option. However, if for some reason we can't do that, calling unwrap() is still unacceptable, especially since if there's no log, we can send an empty Vec.

ctm commented 1 week ago

This is related to the fact that if a parent window sends a message to a child window before the child window has a listener set up, that message is dropped. So, when the table wants to transfer ownership of the log to the pop-up window, it has to hold onto the log until it receives a message from the pop-up letting it know that it's been received.

The errant code-snippet in my comment above is from the processing of SendAll, which is only sent to the parent when the Chat component is created. However, yew can create a component more than once and more than one window can send messages to the parent. In practice, I don't think yew does create the component more than once, nor should there be more than one child window, but something about the computer going to sleep probably broke one of those preconditions.

So, assuming that the unwrap() is indeed the source of the panic, something caused the Table component to receive a SendAll after a MessagesDisplayed. If that is due to more than one child, it might be possible to change the way we pass messages from "having the child send to the parent" to "having the child send to itself but have the parent listen". I don't know if that's actually allowed, so I'll probably have to write some test code to figure out.

ctm commented 1 week ago

I don't think having the child send messages to itself (and having the parent listen for them) is viable, even if it could be implemented, because the child would still need to send a message to the parent saying it's ready to receive. So, if we do anything, it makes more sense for the parent to send some sort of unique id down to the child in the first message and then for then to have the child send that id up with every message. Then we can detect, log and ignore messages from errant children, however, we only have a communication pathway to one child at a time, so I am not sure that there's any penalty, per-se, allowing messages from errant children as long as we get rid of the unwrap.

ctm commented 1 week ago

Fixed. Deploying now.