Open Ageless93 opened 4 years ago
What I want to propose is that instead of the "Communicating with BOINC client, please wait" we put "Loading tasks into memory, please wait" in a window here. Because that is what's happening. Now people may think something's hung. I thought for a while something's hung, it didn't register that 14GB of memory was being loaded and that that cannot be done instantaneously.
Fair comment, but I think we need to step carefully and thoughtfully here. "Communicating with BOINC client" is factually correct - although perhaps "Awaiting reply from BOINC client" is closer to reality.
"Loading tasks into memory" might well be a valid reason, but is it true in this case? Might there be other reasons for slow client initialisation - parsing an exceptionally complex set of attached projects, perhaps, or verifying a huge number of project files on slow storage? We should avoid using definitive statements as to causes, before enumerating and eliminating every possible alternative explanation.
"Loading tasks into memory" might well be a valid reason, but is it true in this case? Might there be other reasons for slow client initialisation - parsing an exceptionally complex set of attached projects, perhaps, or verifying a huge number of project files on slow storage? We should avoid using definitive statements as to causes, before enumerating and eliminating every possible alternative explanation.
It's to simplify what's happening. BOINC is coming on to being 18 years in development and still a lot of people do not know there are separate parts to the program. How many people post just about BOINC Manager, because that's the only thing they see? They don't know, many don't care that there is a client running as well.
In my case it is true that the loading of the data is slow because it comes from a 6TB hard drive which has a raw read speed of 122MB/sec. Perhaps BOINC should index files in its data directory. But just checking a Windows Task Manager will show that both boinc.exe and boincmgr.exe are running and have been for some time, so their slow communication between themselves should get a better explanation. Of course exiting BOINC Manager doesn't necessarily exit the client, but when it does it should also always exit the running tasks. And not leave lots in memory and in a running state.
@Ageless93,
What I want to propose is that instead of the "Communicating with BOINC client, please wait" we put "Loading tasks into memory, please wait" in a window here.
I'm not sure it's technically possible to determine such state from Manager without significant architecture changes
Perhaps BOINC should index files in its data directory.
I'm not sure indexing could ever help because I don't know how could we speed-up reading from hard-drive that is actually handled by OS.
Of course exiting BOINC Manager doesn't necessarily exit the client, but when it does it should also always exit the running tasks.
That bit I certainly concur with - interrupting/cancelling the client initialisation process should always revert the consequential project initialisations. Having said, project applications should self-close if they discover they are running boinc-less. Has Rosetta implemented the API calls consequent on https://boinc.berkeley.edu/trac/wiki/AppIntro correctly?
I just turned SMT off, so only have to load 11 tasks. Still takes 39 seconds from client start to fully loaded. I get it that it may be difficult or impossible to determine the state of what's happening without rewriting the architecture. Perhaps for the 20th anniversary.
And while indexing may not help, we're now checking the presence of a lot of project files in the directory. Depending on how many projects someone has added, this could be substantial. What is the checking BOINC does though? Just count the files, prod them, test their data sanity? What does "Checking presence of 783 project files" do? And how long will that take?
I understand we'll always be hampered by the speed of the slowest bit of the hardware. But I still think that putting a more user friendly message down goes a long way towards them patiently waiting until things have loaded. Because didn't we want to make BOINC more user friendly, with simpler messages?
IMHO, the guiding principle should be that, above all else, a message should be accurate.
I much prefer an accurate, but vague, 'BOINC is waiting for an answer' to a precise but false "BOINC is loading projects". Unless you know for certain what is holding it up this time.
I wasn't saying to make it "BOINC is loading projects" as that's as vague as "Communicating with BOINC client". How about another message, like "Finalizing initialization, please wait"?
During this time a "communicating with BOINC client, please wait" window sits on top of BOINC Manager, with "Exit BOINC Manager" and "Cancel" buttons. The Cancel button doesn't do anything. It'll just close that window and reopen it. Hitting "Exit BOINC Manager" at this point will exit the manager and client but leave all tasks running. They don't seem to get the boinc_exit() signal. Or ignore it.
The Manager has code to issue RPCs asynchronously, but the client does not. So when the client is busy, it can't respond to an RPC. The Manager can issue some RPCs and continue with other work while waiting for a reply, and it just adds these to a queue as needed. But it has to wait for a response for some RPCs before it can continue; these trigger the "Communicating with client" dialog if no response is received after a certain delay (1.5 seconds in most cases.)
The "Cancel" button only cancels the one RPC which triggered the dialog. The dialog keeps reappearing because the Manager has issued another RPC to the client and is again waiting for a client response. The "Exit BOINC Manager" is an emergency exit as a way out of this loop.
The Manager sends a quit RPC to the client, which normally would then shut down all tasks before exiting as part of its shutdown sequence. The Manager waits 10 seconds for the client to shut itself down, after which it forcefully kills the client. But since the client is busy and non-responsive, it does not act on the quit RPC before it is forcefully killed, so the shutdown sequence never happens, leaving the tasks running.
Ideally, the code used in tasks should check for a dead parent process and exit if the parent has died. I don't remember whether or not that is the case. @davidpanderson should be able to answer that.
BOINC apps (i.e., which use the BOINC API) in theory check every 10 seconds to see if the client has died, and exit if so. However, I sometimes see cases where this doesn't happen.
Well, the Rosetta apps were the new 4.20 ones, I don't know which API they use to build them with. Their server status page doesn't show the server version, only database version 27016. But it's more up to date than they used to for a while there.
So I tested something. I ran BOINC Manager, while checking Task Manager details, waited for some Rosetta apps to show there, then exited BOINC Manager. Waited a minute. All those Rosetta tasks were still in memory. Then I restarted BOINC Manager. BOINC then double loads the tasks already running. Something will then kill off the double processes until just one process stays behind. So where I had 15 Rosetta tasks showing in Task Manager details, I now have only 9, exactly the amount of tasks shown running in BOINC Manager.
Then I restarted BOINC Manager. BOINC then double loads the tasks already running. Something will then kill off the double processes until just one process stays behind. So where I had 15 Rosetta tasks showing in Task Manager details, I now have only 9, exactly the amount of tasks shown running in BOINC Manager.
But for.... drum roll... BOINC runs 4 tasks and the other 5 sit waiting to acquire slot directory lock. Another instance may be running. Had to manually kill those 5.
(Edit: I posted about this on the Rosetta forums)
I'm running Rosetta tasks on my AMD Ryzen 3900X 12 core CPU, 24 threads. I've got Use at most N% of the CPUs set to 99%, so I leave one core or thread free for Windows and the GPU. I've got 23 Rosetta tasks of various research, and with widely spread memory use. The smallest is 390MB the largest 715MB.
It takes several seconds for all Rosetta tasks to load into memory and during this time the manager is unresponsive. I have seen this behaviour before, it can take over a minute before the manager populates its windows.
During this time a "communicating with BOINC client, please wait" window sits on top of BOINC Manager, with "Exit BOINC Manager" and "Cancel" buttons. The Cancel button doesn't do anything. It'll just close that window and reopen it. Hitting "Exit BOINC Manager" at this point will exit the manager and client but leave all tasks running. They don't seem to get the boinc_exit() signal. Or ignore it.