HughJeffner / FFRK-LabMem

Highly customizable FFRK Labyrinth Bot
MIT License
10 stars 4 forks source link

Restart loop following FFRK crash on lab completion #131

Closed slartifer closed 2 years ago

slartifer commented 2 years ago

It's pretty common for FFRK to crash upon lab completion. This is not LabMem related, but the bot seems to struggle to get a lab started again after that happens. I've seen this a few times; in each case, I've returned to find the game sitting on the labyrinth map screen after the bot disables itself.

This is different from #122 as I've never seen a connection timeout involved, and the bot stops doing anything long before timeout. Highlight of initial restart attempt:

02:24:18 [Notifcation] Processing event LAB_COMPLETED
02:25:38 [Watchdog] FFRK state: Not Running
02:25:38 [Watchdog] Fault: Crash
02:25:38 Crash detected!
02:25:38 Restarting FFRK
02:25:38 [Watchdog] Paused
02:25:38 [Lab] Cancelling any running tasks
02:25:43 [Lab] Kill ffrk process...
02:25:49 [Lab] Launching app
02:25:49 [Lab] Entering state: Unknown
02:25:49 [Lab] Button finding loop for 172.349s
02:26:09 [Adb] matches: 1, closest: 0.9838035
02:26:26 [Adb] matches: 1, closest: 0.9180371
02:26:35 [Adb] matches: 1, closest: 0.9178148
02:26:38 [Lab] Fatigue values WRITE: Signaled: True, Waiters: 0
02:26:52 [Adb] matches: 1, closest: 0.957491
02:28:48 FFRK restart timed out

After 4 apparently successful button pushes ('Play', 'Labyrinth Dungeons', Labyrinth Dungeons', 'Enter') it appears to sit for 2 minutes until timeout. But it would need to deal with two more 'OK' buttons at that point, to confirm equipment and then to spend stamina (assuming no pots needed).

In all 15 subsequent restart attempts, there were only 3 successful button pushes, which would explain why it was left on the map screen. The odds of the app having an issue on that screen (after a fresh start) 15 out of 15 times seem low. Full logs attached.

011522 restart loop.txt

slartifer commented 2 years ago

This happened again... the output looks pretty much identical. If any further logs would be helpful, let me know, as I seem to be able to generate them pretty easily.

HughJeffner commented 2 years ago

From what I can tell, it appears to be 2 different issues:

  1. FFRK Restart timing out
  2. If the FFRK Restart succeeded then a hang is detected during the lab restart

First, it appears the FFRK Restart process is partially entering the lab? I can't really tell, but as soon as it goes into the swirling thing on the outpost screen it should have claimed the restart succeeded and switched to the restart lab process. The only clue in the log is when the restart succeeded the last button found had a 0.88 match and when it failed it had a 0.91 match, like it's finding a different button or something.

Second, do you have the 'restart ffrk if x minutes pass with no activity' set to 1 minute? There are 2 checks at the start of restarting a lab run for the 'dungeon complete' and 'mission complete' dialogs and if those had already been dismissed it's taking longer than 1 minute thus the watchdog kicks in. I think if it had been set to 2 minutes or higher it might have worked. I need to add a watchdog kick between those two checks to fix this.

slartifer commented 2 years ago

Second, do you have the 'restart ffrk if x minutes pass with no activity' set to 1 minute?

I do. I initially left it set at 3, but I was running into issues where the bot wouldn't detect a crash soon enough, and the device would revert to the lock screen. I don't understand the timing on this, as none of the device settings on that are anywhere near 3 minutes. But when I lowered it to 2, this happened less frequently, and when I lowered it to 1, it stopped happening altogether.

HughJeffner commented 2 years ago

Strange, I've never had any of the emulators I used lock the screen on me. The only time I ever see the lock screen is when it first boots up.

slartifer commented 2 years ago

First, it appears the FFRK Restart process is partially entering the lab?

It seems like it's getting stuck looking for the Enter button. I might be off base here, but this button does have tinier text than most of the others seem to, and has a bit of the Labyrinth Party icon in front of it.

image

HughJeffner commented 2 years ago

The FFRK restart process should never touch that button. Like I said earlier, once it enters the swirly thing (one screen back) it should set the ffrk restart as complete and start the Lab restart process.

HughJeffner commented 2 years ago

https://raw.githubusercontent.com/HughJeffner/FFRK-LabMem/master/FFRK-LabMem/Resources/lab_outpost.bmp

This is the specific 'button' it uses to determine if it's entering a fresh lab. If it taps on this then the restart should complete. The default restart timeout is 3 minutes, but i didn't see any problem with that in the log. Just different 'closest' match percentage

HughJeffner commented 2 years ago

Is it possible it's hitting the 'announcements' thing right underneath it sometimes? Not sure if you actually caught it in action.

Can you post your screen resolution + offsets?

slartifer commented 2 years ago

Haven't caught that part of it in action, no. I'll try to do that next chance I get.

Device resolution - 1080x1920 LabMem offsets - top 0, bottom 2 (set by LabMem)

HughJeffner commented 2 years ago

I just checked and it's possible to test this by making sure you are outside a lab run then press Alt+R on the bot.

I don't see any issue with that screen+offset combo

slartifer commented 2 years ago

The only clue in the log is when the restart succeeded the last button found had a 0.88 match and when it failed it had a 0.91 match, like it's finding a different button or something.

I've just run a manual restart 5 times and I can confirm that 0.88 is the correct match ("Labyrinth Dungeons" swirl). Of course, now that I'm watching, it's the only match it's making. Since it was making the incorrect match 15 times in a row, something else must be affecting it.

From manual clicking, the "Announcements" hitbox may extend up as far as the bottom pixel of the "Labyrinth Dungeons" graphic you linked, if my cursor is accurate, though I could easily be off by one.

There weren't any new announcements yesterday afternoon, so I don't think a notification count could be causing the issue.

To rule out a display issue, here's a screenshot of the same target area:

image

The thing is, in the original case, at the point when the restart attempts terminated, the device was left sitting on the map screen -- which means it had successfully pressed 'Labyrinth Dungeons' (despite the 0.91) and got stuck afterwards. Maybe something in the hand-off to the restart lab process?

HughJeffner commented 2 years ago

My guess at this point is that it is finding a different 'button' where it's finding the 'labyrinth dungeons' graphic. It looks for these 'buttons' in the resources folder in order:

button_blue_play button_brown_ok lab_segment lab_outpost

So it's actually clicking on the graphic but thinks it's one of the other buttons then lab_outpost

slartifer commented 2 years ago

Ah, I see, so then it keeps looking for buttons on the map screen (where it won't find matches) rather than passing off to restart lab. It looks like there is some data passed through the proxy when you reach the map screen; not sure if this could be a useful point of reference.

(I've just attempted to simulate the common crash circumstances by hitting a device reset with the same timing as common post-lab crashes, i.e., after hitting OK on the mastery/drops screen from the last fight, and before anything else has loaded. Unfortunatly, the problem didn't duplicate.)

HughJeffner commented 2 years ago

I think what was happening is between the time it takes to get a screencap and analyze the screen for matches, it allowed enough time for it to find the lab_segment and tap it even though the lab outpost screen had loaded. Since they overlap the tap registers. The timing has to be just right, but it does happen quite frequently.

I solved this in 7.0 by looking for the web request and setting the state that way That plus adding a few watchdog kicks in the lab restart process to keep the 1 minutes from elapsing should fix both these issues.