BOINC / boinc

Open-source software for volunteer computing and grid computing.
https://boinc.berkeley.edu
GNU Lesser General Public License v3.0
2k stars 445 forks source link

boincgui.sh disconnected ... from what? #3532

Open ghost opened 4 years ago

ghost commented 4 years ago

Describe the bug A clear and concise description of what the bug is. When I start boincgui.sh, only the "File" and "Help" menus are active, and the diagnostic message at the bottom-right corner says "Disconnected". So what is disconnected from what?

Is it boincgui.sh disconnected from the boinc daemon? Is it the boinc daemon disconnected from the boinc server? Or does it have something to do with something called "account manager"? Steps To Reproduce

  1. start boinc daemon
  2. start boincgui.sh

Expected behavior Normal functioning, or at least some amount of diagnostic messages saying what's wrong.

Screenshots https://imgur.com/7tFsQdH.png

System Information

Additional context The additional context is that

1)I started boinc in the directory that was not defined as BOINC_DATA at compile-time. And zero feedback from boinc about that. 2)Even when I restarted it in the $BOINC_DATA, the directory was not users-writable, and BOINC actually told me "another instance of boinc running".

So boinc diagnostic messages are either unhelpful or misleading. I would humbly ask to (at least) change the "Disconnected" message into "Cannot connect boinc gui to the boing instance, which you are claiming is running on $chosen_hostname_or_ip".

AenBleidd commented 4 years ago

So what is disconnected from what?

Is it boincgui.sh disconnected from the boinc daemon? Is it the boinc daemon disconnected from the boinc server? Or does it have something to do with something called "account manager"?

BOINC Manager is a tool to manage BOINC client. In order to manage client from Manager you need to connect Manager to client. Status Disconnected means that Manager is disconnected from client. Otherwise there will be an information about the client that is connected to the Manager. E.g. image BOINC client can connect to multiple servers simultaneously. Moreover, there is no BOINC server at all, it is a fully distributed system: client can connect to indefinite number of projects (servers). Also it could connect to Account Manager, but it is a completely another story.

So boinc diagnostic messages are either unhelpful or misleading. I would humbly ask to (at least) change the "Disconnected" message into "Cannot connect boinc gui to the boing instance, which you are claiming is running on $chosen_hostname_or_ip".

By default, BOINC Manager tries to connect to local client, If it found no local client running or was not successful connection to it - it will show this error. Your message is too long and also misleading, because there could be no running and even installed client, so there is no hostname or ip to show.

In your particular case, you should check permissions of running BOINC client. It could be the case that the user that is used to run BOINC Manager has no permissions to read process that is run with another user credentials. Also you should check content of gui_rpc_auth.cfg file. It is a password that is used to connect to local client. Check that this file could be read both by BOINC client and BOINC Manager.

ghost commented 4 years ago

Look, I am used to debugging this kind of stuff. But honesly, a simple word "disconnected" with respect to a "distributed computing" application has a million of opportunities to be interpreted incorrectly.

AenBleidd commented 4 years ago

Is 'Disconnected from client' better for you?

ghost commented 4 years ago

"Disconnected from client" would be an order of magnitude better.

"Disconnected from client. Last connection attempt was to 127.0.0.1 at 23:04" would be two orders of magnitude better.

AenBleidd commented 4 years ago

How this information could help you? For me it's completely useless. Once client was connected, then Manager will constantly try to reconnect to it. If it was not connected or initial connection fails - this is then definitely happened either on Manager start (in case of connecting to local client) or at user request. Both these moments in time likely known to the user.

ghost commented 4 years ago

Well, this "I am constantly trying" is impossible to distinguish from "I tried once and gave up" on the initial connection. This "constantly trying" is the most misleading thing in the world, because who in the world actually knows what exactly it is trying? DNS poisoning? ARP poisoning? Broken routing table? Firewall? Mistyping a comma instead of a period in the address? Server not even running? Server running, but not accepting connections? I can imagine a dozen more cases that may make the gui honestly think it is trying to to reconnect but actually not do it. DNS poisoning is particularly nasty in this respect.

So I would like to have a diagnostic marker for "I am not stuck in a loop doing something meaningless".

AenBleidd commented 4 years ago

On the initial connection Manager tries to connect to local client if no other host/ip is specified as command-line parameter. In any other case it will show next in status: image As for me it's quite descriptive what Manager tries to connect to. All other cases you mentioned above is quite hard to identify from application side.

ghost commented 4 years ago

The "connecting" message is fine.

The confusion comes when something is broken, but I have no clue what it is.

RichardHaselgrove commented 4 years ago

The particular problem is with the single term 'disconnected'. Such a report on a helpdesk or message board has, in the past, often prompted a flurry of suggestions about checking firewalls and opening ports. If there was a test which would distinguish that communications problem from the 'client not running' case, and report accordingly, it would be a great help.

Could we check the running process list in some way?

AenBleidd commented 4 years ago

It depends on OS and permissions. If client running under different credentials, Manager could not have access to the list of processes that run by another user. So there is no 100% guarantee that we could determine whether client is running or not

ghost commented 4 years ago

Well, the point of this bug report was not to suggest making an ultimate issue resolution tool.

I merely wanted the gui to give a clear message to the user, unambiguously saying that "something is wrong, and you need to debug".

The "disconnected from the client" message essentially clarifies that the disconnect is a user-debuggable problem (i.e. that it's not some unknown remote server not accepting connections).

The "last attempt failed at 23:05" conveys the message that "connection" is not something ordinarily slow, as is very common in distributed computing applications, e.g. TOR or BitTorrent (that is what the user is most familiar with nowadays). Should work within seconds.

"The last attempt was made to loalhost" gives a hint that the user may have mistyped the address. Making a complete verification function for resource locations is hard and error prone, and just showing it to the user is the quickest choice.

truboxl commented 4 years ago

I think a better term to use are "connection refused" for manager failure to connect and "disconnected" for manager success in disconnect and "offline" for well offline... All 3 are based on previous states of manager's status. If all 3 are taken up by the same "disconnected" term, users have to guess troubleshoot...

And there's the lack of feedback from manager side as well. If it fails to connect, there's no error or log from the manager side, just blank... The Event Log option only show info from the client, not from the manager...

The manager by itself can work standalone without using the client and can be decoupled. It can be used to connect to clients running on other computers. So making the manager to be integrated with the client is not really an option here.

But I still think its package managers fault for not making BOINC install hassle free on Linux. All these issue can be avoided had BOINC installs truly work OOTB on Linux. @lockywolf can you point out which distro you are using and how you install?

AenBleidd commented 4 years ago

@truboxl, when I installed BOINC on Ubuntu 1904, everything for me was working OOTB. I configured nothing except of Computing preferences. I also like your 3 cases. @lockywolf, is it possible for you to provide a solution for these 3 cases @truboxl described?

truboxl commented 4 years ago

@AenBleidd thanks! The analogy I am using is treating the manager like a web browser. Though not exactly a perfect solution to address what particular connection problem the manager is having (like Richard's https://github.com/BOINC/boinc/issues/3532#issuecomment-605101660), its a stopgap solution for now.

@lockywolf what exactly is boincgui.sh? I have been using BOINC on Linux for a long time and I have never encountered a boincgui.sh file. I always directly use boincmgr or from a .desktop shortcut to launch the manager. Looking up Google will show that boincgui.sh is created for use in Slackware. I am not sure Slackware's BOINC experience is on par with other distro's OOTB experience. Any Slackware devs or package manager here?

ghost commented 4 years ago
lockywolf@delllaptop:~/OfficialRepos$ cat $(which boincgui.sh) | tail -7
DATADIR="/var/lib/boinc"

# Sanity Check
mkdir -p $DATADIR

# run...
boincmgr --clientdir=/usr/bin --datadir=$DATADIR

Slackware's BOINC experience is on par with other distro's OOTB experience.

There is no Slackware's BOINC experience. Slackware's principle is that a UNIX system should work essentially like a Windows system. You take the official distribution medium from the official distribution channel, and it "just works". And if it doesn't, you report a bug in the original project (and potentially help fixing it, as that is how Open Source works). There is a thing called "slackbuilds.org", but that is essentially just a collection of bash scripts for executinng official build instructions from official websites. You can take a slackbuild and run it on AIX, FreeBSD, Solaris, or Cygwin. If your system is POSIX, it should work (minus the makepkg command, which is a tar in disguise).

Any Slackware devs

Me. I'm not the original build author though, but it shouldn't matter, as the issue it totally the same in the git master.

is it possible for you to provide a solution for these 3 cases @truboxl described?

I'll have a look, but I'm not super familiar with wxWidgets.