FoldingAtHome / fah-issues

49 stars 9 forks source link

Unsupported GPUs: Show Error Message Instead of 192.0.2.1 Redirect #1309

Open bb30994 opened 4 years ago

bb30994 commented 4 years ago

The message series

Waiting on Working server assignment: Waiting on Working unit:.

Work Server 192.0.2.1 Collection server 0.0.0.0 The computer keeps on cycling though: Assigning a work server 192.0.2.1:8080 02:29:49:WARNING:WU13:FS01:WorkServer connection failed on port 8080 trying 80 02:30:21:ERROR:WU13:FS01:Exception: Failed to connect to 192.0.2.1:80: Connection timed out Is misleading and factually incorrect See https://foldingforum.org/viewtopic.php?f=61&t=32061

In fact 192.0.2.1 cannot be a Work Server and I'm not sure why the client code thinks it might be. Then, too, a system with

GPU: (Dual) Beavercreek [Integrated] Seymour [Radeon HD 6470M]

should not be configured with a FAH GPU. The integrated GPU (AMD:4) is not supported nor is the Seymour GPU (AMD:4) as they do not support Double Precision, so they cannot be assigned Mixed Precision WUs which is all we're currently supplying.

bb30994 commented 4 years ago

It probably should be pointed out that the sequence of message

Connecting to 65.254.110.245:8080 Assigned to work server 192.0.2.1 Requesting new work unit for slot 01: READY gpu:0:GK107 [GeForce GT 650M] from 192.0.2.1 Connecting to 192.0.2.1:8080 WARNING::WorkServer connection failed on port 8080 trying 80 That sequence of words Is a terrible example of meaningless statements which gives the Donor no clue of what to do.

It seems to occur when there is no supported GPU detected, (such as on a Mac).

bjhulst commented 4 years ago

Could you perhaps make a proper error message like "GPU type not supported" or with a actual reason why it is not supported"? I spent some time to figure out why my GPU was not supported either. I am sure a lot of people did/do... :-)

Yonseca commented 4 years ago

Hi,

This week I also tried to use my GPU with foldingathome in my fresh Debian 10. My graphics card (Radeon HD 6750) seems to be detected, and I think it's also whitelisted at GPUs.txt file. However, no WU are being assinged, and I also receive the error pointed out in this issue. My config file includes a passkey and some WU are being currently processed with CPU.

Isn't it enough to have a GPU listed in GPUs.txt to start folding?

Captura de pantalla de 2020-03-26 00-58-30

Screenshot_2020-03-26 Local Folding home Web Control - Version 7 5 1

Thanks!

James-E-A commented 4 years ago

I am also experiencing this bug:

(this is absurd; permanently unassigned IPs ever be attempted as work servers? I don't see why Assigned to work server 192.0.2.1 happened.)

*********************** Log Started 2020-03-27T02:07:07Z ***********************
02:07:07:************************* Folding@home Client *************************
02:07:07:        Website: https://foldingathome.org/
02:07:07:      Copyright: (c) 2009-2018 foldingathome.org
02:07:07:         Author: Joseph Coffland <joseph@cauldrondevelopment.com>
02:07:07:           Args: --child --lifeline 1094 /etc/fahclient/config.xml --run-as
02:07:07:                 fahclient --pid-file=/var/run/fahclient.pid --daemon
02:07:07:         Config: /etc/fahclient/config.xml
02:07:07:******************************** Build ********************************
02:07:07:        Version: 7.5.1
02:07:07:           Date: May 12 2018
02:07:07:           Time: 22:51:07
02:07:07:     Repository: Git
02:07:07:       Revision: 4705bf53c635f88b8fe85af7675557e15d491ff0
02:07:07:         Branch: master
02:07:07:       Compiler: GNU 4.4.7 20120313 (Red Hat 4.4.7-18)
02:07:07:        Options: -std=gnu++98 -O3 -funroll-loops
02:07:07:       Platform: linux2 4.14.0-3-amd64
02:07:07:           Bits: 64
02:07:07:           Mode: Release
02:07:07:******************************* System ********************************
02:07:07:            CPU: Intel(R) Xeon(R) CPU E5410 @ 2.33GHz
02:07:07:         CPU ID: GenuineIntel Family 6 Model 23 Stepping 6
02:07:07:           CPUs: 8
02:07:07:         Memory: 31.36GiB
02:07:07:    Free Memory: 30.67GiB
02:07:07:        Threads: POSIX_THREADS
02:07:07:     OS Version: 5.5
02:07:07:    Has Battery: false
02:07:07:     On Battery: false
02:07:07:     UTC Offset: -5
02:07:07:            PID: 1096
02:07:07:            CWD: /var/lib/fahclient
02:07:07:             OS: Linux 5.5.10-200.fc31.x86_64 x86_64
02:07:07:        OS Arch: AMD64
02:07:07:           GPUs: 1
02:07:07:          GPU 0: Bus:2 Slot:0 Func:0 NVIDIA:0 G84 [Quadro FX 1700]
02:07:07:  CUDA Device 0: Platform:0 Device:0 Bus:2 Slot:0 Compute:1.1 Driver:6.5
02:07:07:OpenCL Device 0: Platform:0 Device:0 Bus:2 Slot:0 Compute:1.0 Driver:340.108
02:07:07:***********************************************************************
02:07:07:<config>
02:07:07:  <!-- User Information -->
02:07:07:  <passkey v='********************************'/>
02:07:07:  <team v='31885'/>
02:07:07:  <user v='James.Edington@uah.edu'/>
02:07:07:
02:07:07:  <!-- Folding Slots -->
02:07:07:  <slot id='0' type='CPU'/>
02:07:07:  <slot id='1' type='GPU'/>
02:07:07:</config>
02:07:07:Switching to user fahclient
02:07:07:Trying to access database...
02:07:07:Successfully acquired database lock
02:07:07:Enabled folding slot 00: READY cpu:6
02:07:07:Enabled folding slot 01: READY gpu:0:G84 [Quadro FX 1700]
02:07:07:WU01:FS00:Starting
02:07:07:WU01:FS00:Running FahCore: /usr/bin/FAHCoreWrapper /var/lib/fahclient/cores/cores.foldingathome.org/v7/lin/64bit/Core_a7.fah/FahCore_a7 -dir 01 -suffix 01 -version 705 -lifeline 1096 -checkpoint 15 -np 6
02:07:07:WU01:FS00:Started FahCore on PID 1113
02:07:07:WU01:FS00:Core PID:1117
02:07:07:WU01:FS00:FahCore 0xa7 started
02:07:07:WU01:FS00:0xa7:*********************** Log Started 2020-03-27T02:07:07Z ***********************
02:07:07:WU01:FS00:0xa7:************************** Gromacs Folding@home Core ***************************
02:07:07:WU01:FS00:0xa7:       Type: 0xa7
02:07:07:WU01:FS00:0xa7:       Core: Gromacs
02:07:07:WU01:FS00:0xa7:       Args: -dir 01 -suffix 01 -version 705 -lifeline 1113 -checkpoint 15 -np 6
02:07:07:WU01:FS00:0xa7:************************************ CBang *************************************
02:07:07:WU01:FS00:0xa7:       Date: Nov 5 2019
02:07:07:WU01:FS00:0xa7:       Time: 05:57:01
02:07:07:WU01:FS00:0xa7:   Revision: 46c96f1aa8419571d83f3e63f9c99a0d602f6da9
02:07:07:WU01:FS00:0xa7:     Branch: master
02:07:07:WU01:FS00:0xa7:   Compiler: GNU 8.3.0
02:07:07:WU01:FS00:0xa7:    Options: -std=c++11 -O3 -funroll-loops -fno-pie -fPIC
02:07:07:WU01:FS00:0xa7:   Platform: linux2 4.19.0-5-amd64
02:07:07:WU01:FS00:0xa7:       Bits: 64
02:07:07:WU01:FS00:0xa7:       Mode: Release
02:07:07:WU01:FS00:0xa7:************************************ System ************************************
02:07:07:WU01:FS00:0xa7:        CPU: Intel(R) Xeon(R) CPU E5410 @ 2.33GHz
02:07:07:WU01:FS00:0xa7:     CPU ID: GenuineIntel Family 6 Model 23 Stepping 6
02:07:07:WU01:FS00:0xa7:       CPUs: 8
02:07:07:WU01:FS00:0xa7:     Memory: 31.36GiB
02:07:07:WU01:FS00:0xa7:Free Memory: 30.64GiB
02:07:07:WU01:FS00:0xa7:    Threads: POSIX_THREADS
02:07:07:WU01:FS00:0xa7: OS Version: 5.5
02:07:07:WU01:FS00:0xa7:Has Battery: false
02:07:07:WU01:FS00:0xa7: On Battery: false
02:07:07:WU01:FS00:0xa7: UTC Offset: -5
02:07:07:WU01:FS00:0xa7:        PID: 1117
02:07:07:WU01:FS00:0xa7:        CWD: /var/lib/fahclient/work
02:07:07:WU01:FS00:0xa7:******************************** Build - libFAH ********************************
02:07:07:WU01:FS00:0xa7:    Version: 0.0.18
02:07:07:WU01:FS00:0xa7:     Author: Joseph Coffland <joseph@cauldrondevelopment.com>
02:07:07:WU01:FS00:0xa7:  Copyright: 2019 foldingathome.org
02:07:07:WU01:FS00:0xa7:   Homepage: https://foldingathome.org/
02:07:07:WU01:FS00:0xa7:       Date: Nov 5 2019
02:07:07:WU01:FS00:0xa7:       Time: 06:13:26
02:07:07:WU01:FS00:0xa7:   Revision: 490c9aa2957b725af319379424d5c5cb36efb656
02:07:07:WU01:FS00:0xa7:     Branch: master
02:07:07:WU01:FS00:0xa7:   Compiler: GNU 8.3.0
02:07:07:WU01:FS00:0xa7:    Options: -std=c++11 -O3 -funroll-loops -fno-pie
02:07:07:WU01:FS00:0xa7:   Platform: linux2 4.19.0-5-amd64
02:07:07:WU01:FS00:0xa7:       Bits: 64
02:07:07:WU01:FS00:0xa7:       Mode: Release
02:07:07:WU01:FS00:0xa7:************************************ Build *************************************
02:07:07:WU01:FS00:0xa7:       SIMD: sse2
02:07:07:WU01:FS00:0xa7:********************************************************************************
02:07:07:WU01:FS00:0xa7:Project: 13850 (Run 0, Clone 37467, Gen 2)
02:07:07:WU01:FS00:0xa7:Unit: 0x00000004287234c95e788a97a6c3879a
02:07:07:WU01:FS00:0xa7:Digital signatures verified
02:07:07:WU01:FS00:0xa7:Calling: mdrun -s frame2.tpr -o frame2.trr -x frame2.xtc -e frame2.edr -cpi state.cpt -cpt 15 -nt 6
02:07:08:WU01:FS00:0xa7:Steps: first=1000000 total=500000
02:07:09:WU00:FS01:Connecting to 65.254.110.245:8080
02:07:11:WU00:FS01:Assigned to work server 192.0.2.1
02:07:11:WU00:FS01:Requesting new work unit for slot 01: READY gpu:0:G84 [Quadro FX 1700] from 192.0.2.1
02:07:11:WU00:FS01:Connecting to 192.0.2.1:8080
02:07:11:WU01:FS00:0xa7:Completed 337212 out of 500000 steps (67%)
James-E-A commented 4 years ago

I think I came up with a pretty good workaround:

sudo ip addr add 192.0.2.1/24 dev `cd /sys/class/net ; ls -d1t e* | head -n 1`
sudo ssh -v `whoami`@localhost -L 192.0.2.1:80:40.114.52.201:80\
 -L 192.0.2.1:8080:40.114.52.201:8080 tail -F /var/lib/fahclient/log.txt
JulianGro commented 4 years ago

Same issue here. Don't know if my graphics card is not supported or what. The FAQ does say 5xxx series or newer. I am running an AMD RADEON HD 6450 with radeon 19.0.1-1 on Debian Buster. log.txt: https://gist.github.com/jug007/61e748fdc048c8f8b8a5aa26ad89aba9

shorttack commented 4 years ago

One of the problems in the above thread is busy work servers, which are a big problem these days to to the prodigious growth of the project.

Note to FAHclient team: is 192.0.xxx.xxx a bogus reported IP address? Are obsolete graphics cards configured and accepted, only to not work?

PantherX commented 4 years ago

That IP Address is reported when a macOS system adds a GPU Slot. Since the GPUs.txt file is shared across all the OS, a GPU that works in Windows & Linux will not work in macOS as the OS doesn't support GPU folding.

MikeB2012 commented 4 years ago

Getting similar error to OP on Windows 10 machine:

16:18:47:WU02:FS00:0xa7:Completed 217500 out of 250000 steps (87%)
16:20:34:230:127.0.0.1:New Web connection
16:23:06:WU02:FS00:0xa7:Completed 220000 out of 250000 steps (88%)
16:24:06:ERROR:Receive error: 10053: An established connection was aborted by the software in your host machine.
16:24:22:ERROR:Receive error: 10053: An established connection was aborted by the software in your host machine.
16:26:41:WU01:FS01:Connecting to xx.xxx.xxx.xxx:8080
16:26:42:WU01:FS01:Assigned to work server 192.0.2.1
16:26:43:WU01:FS01:Requesting new work unit for slot 01: READY gpu:0:Barts XT [Radeon HD 6870] from 192.0.2.1
16:26:43:WU01:FS01:Connecting to 192.0.2.1:8080
16:27:04:WARNING:WU01:FS01:WorkServer connection failed on port 8080 trying 80
16:27:04:WU01:FS01:Connecting to 192.0.2.1:80
16:27:25:ERROR:WU01:FS01:Exception: Failed to connect to 192.0.2.1:80: A connection attempt failed because the connected party did not properly respond after a period of time, or established connection failed because connected host has failed to respond.
16:27:26:WU02:FS00:0xa7:Completed 222500 out of 250000 steps (89%)
16:31:46:WU02:FS00:0xa7:Completed 225000 out of 250000 steps (90%)
16:36:11:WU02:FS00:0xa7:Completed 227500 out of 250000 steps (91%)

Is there a workaround for Win10? Note that there is a retry looks like every 5 hours.

There is also some chatter about this at the foldingforum.org.

MikeB2012 commented 4 years ago

This was the comment in the folding forum. It looks like in my case the graphics card (Radeon HD 6870) does not meet the current FAH requirements (openCL1.2 + double precision).

The 5690 no longer meets the requirements, the 5870 does. If you look up your card, current minimum is OpenCL 1.2 and support for double precision calculations. There are lists of AMD and nVidia GPU's on wikipedia that are useful for quick checks on whether a card is usable.

Because of the way the GPUs.txt file was originally created and cards added, and the use of older chip series in later card series by AMD, and to a lesser extent nVidia; some cards are still listed as "supported", but are not. As we identify them and someone has a chance to update the file they get marked as unsupported. But in the meantime the server software redirects WU requests to an IP in the reserved 192.0.2.n range. From what I understand, the next version of the client will get a message documenting the reason and request the GPU be disabled for folding
bb30994 commented 4 years ago

There are 3 issues discussed above.
1) the use of 192.0.. which is the (valid) issue of this ticket. Question: Where does FAHClient obtain this IP? (e.g.- referencing uninitized storage?)

2) Configuring unsupported GPUs. FAHClient references GPUs.txt to decide which GPUSpecies is supported. The Project on the WorkServer decides which Species and GPU brand is supported and excludes those which are incompatible with the project. Examples --not necessary valid for any specific project ([GPU=AMD GPUSpecies>=6] [GPU=NVidia GPUSpecies>4]). If the owner of the project incorrectly assigns his/her project to GPUs that only support FP32, the error should be reported in the sciece log by the FAHCore and directed to the project owner. The error reported in FAH.log to the Donor probably should say "contact the project owner..."

3) For many years, there has been no supported FAHCore for OS-X When FAHClient is initialized, it does not create a GPU slot. When the beta client is released, we'll have to see what happens. In the past, there was nothing preventing the Donor from manually adding a GPU slot, which created one of the examples above. FAHClient should have an error message saying "Unable to load a FAHCore for slot xx." Let's see if we can get that in the beta client.

If the Mac only has an (unsupported) iGPU then GPUs.txt will exclude it from being configured or at least report that it can't find the GPU. That's another area where the client needs to be more helpful. At the present time, there are no Intel GPUs listed and there are very few Mac's with supported AMD or NVidia dGPUs.

shorttack commented 4 years ago

New Issue #1390

Configuring unsupported GPUs. FAHClient references GPUs.txt to decide which GPUSpecies is supported. The Project on the WorkServer decides which Species and GPU brand is supported and excludes those which are incompatible with the project. Examples --not necessary valid for any specific project ([GPU=AMD GPUSpecies>=6] [GPU=NVidia GPUSpecies>4]). If the owner of the project incorrectly assigns his/her project to GPUs that only support FP32, the error should be reported in the sciece log by the FAHCore and directed to the project owner. The error reported in FAH.log to the Donor probably should say "contact the project owner..."

SoNickRND commented 4 years ago

Nvidia GeForce GTX 570 All worked fine at the end of April. I took one week off, needed more computing for my stuff. Tonight I started F@h and got these messages in log. *********************** Log Started 2020-05-06T23:11:11Z *********************** 23:11:11:WU01:FS01:Connecting to 65.254.110.245:8080 23:11:12:WU01:FS01:Assigned to work server 192.0.2.1 23:11:12:WU01:FS01:Requesting new work unit for slot 01: READY gpu:0:GF110 [GeForce GTX 570 HD] from 192.0.2.1 23:11:12:WU01:FS01:Connecting to 192.0.2.1:8080 23:11:33:WARNING:WU01:FS01:WorkServer connection failed on port 8080 trying 80 23:11:33:WU01:FS01:Connecting to 192.0.2.1:80 23:11:54:ERROR:WU01:FS01:Exception: Failed to connect to 192.0.2.1:80 Can't be my GPU became obsolete in one week.

pcnetworx1 commented 4 years ago

@SoNickRND NVIDIA GeForce GTX 580, same situation here except I did not take one week off. My graphics card has been crushing F@H units the past six weeks, and just started throwing the exact same error message earlier today. In fact, I found this post by copying/pasting that error message into Google.

shorttack commented 4 years ago

@pcnetworx1 @SoNickRND

  1. Please read what @bb30994 wrote above.
  2. The latest beta 7.6.13 at foldingathome.org/beta has changes to gpus.txt handling
  3. Can you please post your log file in the Official Forum: https://foldingforum.org/index.php for the issue to be diagnosed? Once the issue is identified and reproducible, an issue will be raised here. GitHub is not used for troubleshooting, the Forum is. -Issue Support-
SoNickRND commented 4 years ago

Can you please post your log file in the Official Forum

Sure, Which thread should I post it? Also, I have client 7.5.1, should I update?

ffissore commented 4 years ago

Yes @SoNickRND please ensure you're running the latest version available (7.6.13 at the time of writing). Bugs may have been fixed in the meanwhile

shorttack commented 4 years ago

@SoNickRND I did a quick search at the forum: make your choice https://foldingforum.org/search.php?keywords=Connecting+to+192.0.2.1%3A80

PantherX commented 4 years ago

@SoNickRND @pcnetworx1 Please note that both your GPUs are Femi which might explain that a misconfiguration caused your GPUs to be blacklisted. However, it has been resolved AFAIK but if not, please post here with the log files: https://foldingforum.org/viewtopic.php?f=83&t=35146

petesimon commented 4 years ago

i have the same problem with a Dell branded "Caicos [AMD RADEON HD 6450]" 1GB card running FAH standard version 7.6.9 in Windows 10 Pro x64. 😿

PantherX commented 4 years ago

@petesimon Please note that your GPU doesn't support Double Precision (FP64) thus can't be used for folding :(

Thermi commented 4 years ago

Problem occurs with NAVI 14 based GPU with 7.6.13. Note that NAVI 14 is listed in GPUS.txt.

tradej commented 4 years ago

Same with AMD Radeon RX 480 in Fedora with the ROCm OpenCL installation. Finished a unit today in the morning and didn't get another one.

Log: https://pastebin.com/raw/dHgbtuRQ

megs-rs commented 4 years ago

Hi there, I have the same problem here. I have a GPU Radeon RX Vega gfx902 and worked once. The ASSIGN1 server assign a wrong IP server in the next time. Look this session LOG:

23:07:14:WU02:FS03:Connecting to assign1.foldingathome.org:80 23:07:16:WU02:FS03:Assigned to work server 192.0.2.1 23:07:16:WU02:FS03:Requesting new work unit for slot 03: READY gpu:0:raven [Radeon RX Vega gfx902] from 192.0.2.1 23:07:16:WU02:FS03:Connecting to 192.0.2.1:8080 23:09:28:WARNING:WU02:FS03:WorkServer connection failed on port 8080 trying 80

I think that the problem is in "assign1.foldingathome.org:80" server.

berndkuennen commented 4 years ago

Same problem with RX 570 and fahclient_7.6.13_amd64, freshly installed on a Linux 4.15.0-36-generic #39~16.04.1-Ubuntu SMP Tue Sep 25 08:59:23 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux.

Can please somebody check assign1.foldingathome.org?

15:18:51:WU00:FS00:Connecting to assign1.foldingathome.org:80 15:18:52:WU00:FS00:Assigned to work server 192.0.2.1 15:18:52:WU00:FS00:Requesting new work unit for slot 00: READY gpu:0:Ellesmere XT [Radeon RX 470/480/570/580/590] from 192.0.2.1 15:18:52:WU00:FS00:Connecting to 192.0.2.1:8080 15:18:52:WARNING:WU00:FS00:WorkServer connection failed on port 8080 trying 80 15:18:52:WU00:FS00:Connecting to 192.0.2.1:80 15:18:52:ERROR:WU00:FS00:Exception: Failed to connect to 192.0.2.1:80: Network is unreachable

Edit/2020-08-12 Tried a AMD 380 today, same result: 06:31:03:WU00:FS00:Connecting to assign1.foldingathome.org:80 06:31:05:WU00:FS00:Assigned to work server 192.0.2.1 06:31:05:WU00:FS00:Requesting new work unit for slot 00: READY gpu:0:Tonga [Radeon R9 200/300 Series] from 192.0.2.1 06:31:05:WU00:FS00:Connecting to 192.0.2.1:8080 06:31:05:WARNING:WU00:FS00:WorkServer connection failed on port 8080 trying 80

PantherX commented 4 years ago

@Thermi , @tradej , @megs-rs & @berndkuennen Please note that a while back, there was an issue with updating GPUs.txt and that accidentally caused an issue. This has been resolved since then.

Assuming your GPUs support OpenCL 1.2 and Double Precision, they will fold. If you're encountering issues, please head over to the Official Forum (https://foldingforum.org/index.php) for troubleshooting. There are many members there to help you investigate and hopefully, resolve your issue(s) 😃

Mithrandir2k18 commented 3 years ago

Having the same issue, since updating from the AUR with an RTX 2080. Always get directed to 192.0.2.1 for the GPU slot.

Deleting the GPU slot and setting it to auto-configure makes it work for the current session, after reboot it seems to break again.

James-E-A commented 3 years ago

Could someone edit the title of this issue to something along the lines of

Unsupported GPUs: show error message instead of 192.0.2.1 redirect

I believe it would help a lot of the people stumbling across this thread


And, @Mithrandir2k18, go to the New GPUs (whitelist) section of the Forum to report this.