Open Hou5e opened 1 month ago
Those show the clock icon so they are waiting to retry the run.
The status text is confusing. I intentionally setup the status text so that when waiting it shows the status it's waiting on rather than just Waiting
. It might be nice if it said something like Waiting to run
but then the text is getting too long. Alternatively, it could just say Waiting
but then you don't know what it's waiting on. It could be waiting to rerun the core, waiting to download the core or WU or waiting to retry uploading the results.
It is supposed to show wait_progress
which I would expect to be non-zero. Also, ETA should probably be non-zero.
If you drop all the gerunds and verb phrases, it could be very short, like Wait: Run
and Wait: Assign
.
Maybe label State
instead of Status Text
, which seems ugly to me.
Those show the clock icon so they are waiting to retry the run.
Nope. I think this is where FAH literally missed the initial status message for the resource group, and doesn't know what the state is until the WU finishes and starts a new WU. Or, if you refresh the page, it will load all the data and fix itself, to show the correct 'Running' status icon and ETA times. Those resource groups with the waiting icon and 0 ETA really are 'Running', but the remote viewing is missing the information to display it correctly.
The title of this issue should be changed back to the original one...
Why do you think those WUs are running
and not waiting
? The status text says Running
but that is because that is the state it is waiting to retry. The clock icon tells us that the WU is waiting.
Unless I'm still missing something, I think you're misinterpreting the bug in this case.
Yes, you are still mistaking the the state: They are definitely not waiting
. You can see the true state from other web browser pages viewing remotes (from the same PC or other PCs). If you refresh a web page with those false ETA and icon states, it will fix itself, and display the correct information of the running icon and actual ETA time. You can also wait for the WU to complete (since it is not waiting
) and the information displayed will correct itself when that WU uploads and the next WU starts.
Basically, this issue needs better error handling for when a state packet is missed/corrupted, and the incremental information doesn't fix the state until the state changes. Possibly checking for "Running" text and the Running icon. If those 2 items are not in agreement then ask for a full state update packet (instead of a partial update packet) or refresh the web page to force that to happen.
Ok, if two different instances of Web Control are disagreeing then that's a problem. How often does this occur?
It's very unlikely that there are missing or corrupted updates. The protocol prevents this. It is possible that something is causing an exception to be thrown which can cause an update to be discarded. If this is the case then the thrown exception would show up in the browser's developer console. The developer console must be open when it happens though.
Over the past 2 months, I have seen it less than 6 times. I've only seen it happen mid-day on weekdays (With most all FAH clients running, hotter part of the day, more internet traffic, internet or router capacity is more likely to be exceeded and stop working for 1-5 minutes arbitrarily). I've seen it the most when Pausing FAH, then updating to the latest FAH, and resuming (like somewhere in the shutdown / restart / run process the status gets lost to one FAH instance and not another). I have seen it happen for a PC starting up for the day (I'm typically not watching the PCs then, and would miss seeing it most of the time). The resource groups affected seem arbitrary, like in the original issue image of 1-2 Resource Groups that missed an update message, are on separate PCs. I'll try and leave a browser debug console open for this.
I need a way to reproduce this.
FAH v8.4.3 viewing Remotes (on Windows), the resource groups sometimes lose contact, missing running data updates, and are shown like this:
I think I've seen it on Linux as well. I've seen this about 2-3 times in the last day. It mostly happened when FAH was update installing v8.4.2-->v8.4.3, or a PC came online and started folding again. Refreshing the browser fixes this issue.
It seems like it happens when the FAH web control is opened again (2nd instance causes the first instance to have the issue) either on that PC or a separate PC (and local network slow-down issues might be causing it).