BOINC / boinc

Open-source software for volunteer computing and grid computing.
https://boinc.berkeley.edu
GNU Lesser General Public License v3.0
2.03k stars 449 forks source link

Data on root of secondary drive causes Boinc to lose GPUs. #4629

Open hucker75 opened 2 years ago

hucker75 commented 2 years ago

Describe the bug If Boinc is told to use for example D:\ as the data directory, it will run but say there are no usable GPUs. But it's fine in D:\somename

System Information

AenBleidd commented 2 years ago

Are you sure you didn't install BOINC to run as a service?

hucker75 commented 2 years ago

No, that would stop GPU work altogether, but it works ok when changing the data directory. It was installed with defaults, just changed the data directory. It just doesn't like root directory. It doesn't warn against it, and I would expect a file not found error, not a GPU missing.

CharlieFenton commented 2 years ago

Did BOINC write a file _coprocinfo.xml on the C: drive by any chance?

In the Options->Event Log options ... menu, please set the coprocessor_debug flag and look for items starting with [coproc] in the event log. Copy them and put them in a comment here.

photohac commented 2 years ago

I have the exact same problem. I was trying to install the data files on a M.2 in the root and BOINC gave me this same problem.

I got a second M.2 and the same thing happened. I traded that for a SATA SSD, had the same problem, then created a folder and it works.

The file you mention above does not show up when I have the folder installed. Would it have showed up with the files written to the root?

computezrmle commented 2 years ago

Did you check ownership and access rights of the new drive's root directory? Ownership of the drive and it's root dir may be SYSTEM while the ownership of a directory you created may be hucker75.

photohac commented 2 years ago

How do you do that? I'll check around 6pm CET when I'm done.

How many places do you hang out in? I've seen your name before. LHC or RAH?

On Tue, Feb 8, 2022, 08:30 computezrmle @.***> wrote:

Did you check ownership and access rights of the new drive's root directory? Ownership of the drive and it's root dir may be SYSTEM while the ownership of a directory you created may be hucker75.

— Reply to this email directly, view it on GitHub https://github.com/BOINC/boinc/issues/4629#issuecomment-1032293045, or unsubscribe https://github.com/notifications/unsubscribe-auth/AXVFKRHCC6ELPPMVY6IMSDTU2DBBPANCNFSM5NV5NFJA . Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.

You are receiving this because you commented.Message ID: @.***>

CharlieFenton commented 2 years ago

@photohac: are you on a Windows 10 computer? My response to @hucker75 was regarding his Windows 10 system. Also, did you set the coproc_debug flag as I requested? I don't see the items starting with [coproc] from your event log here.

(I incorrectly referred to it as the coprocessor_debug flag in my earlier post; the correct item to check is coproc_debug.)

RichardHaselgrove commented 2 years ago

This problem is so simple that anybody with two drives and a GPU should be able to debug it. So I did.

08/02/2022 10:51:48 |  | [coproc] launching child process at D:\BOINC\boinc.exe
08/02/2022 10:51:48 |  | [coproc] with data directory "D:\"
08/02/2022 10:51:48 |  | GPU detection failed: process exited with status 0x1: Incorrect function. (0x1)
08/02/2022 10:51:48 |  | [coproc] read_coproc_info_file() returned error -108
08/02/2022 10:51:48 |  | No usable GPUs found

Boinc has created:

image

and

image

This test performed with BOINC v7.16.20 and Windows 7 Professional SP1.

photohac commented 2 years ago

Win10 yes. Flags, not yet, tonight when I get home. That will be in 5-6 hours

On Tue, Feb 8, 2022, 11:09 CharlieFenton @.***> wrote:

@photohac https://github.com/photohac: are you on a Windows 10 computer? My response to @hucker75 https://github.com/hucker75 was regarding his Windows 10 system. Also, did you set the coproc_debug flag as I requested? I don't see the items starting with [coproc] from your event log here.

(I incorrectly referred to it as the coprocessor_debug flag in my earlier post; the correct item to check is coproc_debug.)

— Reply to this email directly, view it on GitHub https://github.com/BOINC/boinc/issues/4629#issuecomment-1032432049, or unsubscribe https://github.com/notifications/unsubscribe-auth/AXVFKREL6GJ6KMPL7CY2ZK3U2DTURANCNFSM5NV5NFJA . Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.

You are receiving this because you were mentioned.Message ID: @.***>

photohac commented 2 years ago

But the question is why does it say no Co processors found when in root, but not when in a folder?!?!

That's been my question all along!! No one answers that directly.

It works fine when project data is in a folder, but not in the root.

So why does this happen? What makes the difference? The physical setup of the system has not changed only the path of the project data!

Then it gives you this stupid non specific error.

On Tue, Feb 8, 2022, 12:02 RichardHaselgrove @.***> wrote:

This problem is so simple that anybody with two drives and a GPU should be able to debug it. So I did.

08/02/2022 10:51:48 | | [coproc] launching child process at D:\BOINC\boinc.exe 08/02/2022 10:51:48 | | [coproc] with data directory "D:\" 08/02/2022 10:51:48 | | GPU detection failed: process exited with status 0x1: Incorrect function. (0x1) 08/02/2022 10:51:48 | | [coproc] read_coproc_info_file() returned error -108 08/02/2022 10:51:48 | | No usable GPUs found

Boinc has created:

[image: image] https://user-images.githubusercontent.com/14886436/152973729-58a786f1-4c95-40af-939e-3cd9898516ae.png

and

[image: image] https://user-images.githubusercontent.com/14886436/152973950-89a57b7b-a0c5-4e72-b99c-6c4f5b9688ab.png

This test performed with BOINC v7.16.20 and Windows 7 Professional SP1.

— Reply to this email directly, view it on GitHub https://github.com/BOINC/boinc/issues/4629#issuecomment-1032481490, or unsubscribe https://github.com/notifications/unsubscribe-auth/AXVFKRA74FAWRUCE6GQ4RTDU2DZ3DANCNFSM5NV5NFJA . Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.

You are receiving this because you were mentioned.Message ID: @.***>

photohac commented 2 years ago

BOINC works fine as long as there is a folder that the project data is in.

If the data is NOT in the folder then it coughs up the no coprocessors found.

Why is this happening?

On Tue, Feb 8, 2022, 12:11 Greg Hall @.***> wrote:

But the question is why does it say no Co processors found when in root, but not when in a folder?!?!

That's been my question all along!! No one answers that directly.

It works fine when project data is in a folder, but not in the root.

So why does this happen? What makes the difference? The physical setup of the system has not changed only the path of the project data!

Then it gives you this stupid non specific error.

On Tue, Feb 8, 2022, 12:02 RichardHaselgrove @.***> wrote:

This problem is so simple that anybody with two drives and a GPU should be able to debug it. So I did.

08/02/2022 10:51:48 | | [coproc] launching child process at D:\BOINC\boinc.exe 08/02/2022 10:51:48 | | [coproc] with data directory "D:\" 08/02/2022 10:51:48 | | GPU detection failed: process exited with status 0x1: Incorrect function. (0x1) 08/02/2022 10:51:48 | | [coproc] read_coproc_info_file() returned error -108 08/02/2022 10:51:48 | | No usable GPUs found

Boinc has created:

[image: image] https://user-images.githubusercontent.com/14886436/152973729-58a786f1-4c95-40af-939e-3cd9898516ae.png

and

[image: image] https://user-images.githubusercontent.com/14886436/152973950-89a57b7b-a0c5-4e72-b99c-6c4f5b9688ab.png

This test performed with BOINC v7.16.20 and Windows 7 Professional SP1.

— Reply to this email directly, view it on GitHub https://github.com/BOINC/boinc/issues/4629#issuecomment-1032481490, or unsubscribe https://github.com/notifications/unsubscribe-auth/AXVFKRA74FAWRUCE6GQ4RTDU2DZ3DANCNFSM5NV5NFJA . Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.

You are receiving this because you were mentioned.Message ID: @.***>

RichardHaselgrove commented 2 years ago

I think it's probably because of a (little known) security feature in Windows. Before running this test, I tried to make a backup of my working BOINC data folder (D:\BOINCdata). 7-zip failed to write to D:\, although I had a older (2018) backup in there already.

This probably requires a change to the BOINC Windows installer program, or failing that the documentation. New users need to be prevented from, or at least advised against, using the drive's root as a BOINC data folder.

photohac commented 2 years ago

Well, I figured since it's a drive dedicated to BOINC Data (because RAH python tasks consume so much space and memory) that I could install it in the root. As you said, the installer needs to be reprogrammed to not allow this or BOINC needs a code change to allow it.

BOINC program is on C and project data is on a dedicated D drive.

But why does the program allow CPU tasks to run from root and not GPU? This does not make sense to me.

On Tue, Feb 8, 2022, 12:26 RichardHaselgrove @.***> wrote:

I think it's probably because of a (little known) security feature in Windows. Before running this test, I tried to make a backup of my working BOINC data folder (D:\BOINCdata). 7-zip failed to write to D:, although I had a older (2018) backup in there already.

This probably requires a change to the BOINC Windows installer program, or failing that the documentation. New users need to be prevented from, or at least advised against, using the drive's root as a BOINC data folder.

— Reply to this email directly, view it on GitHub https://github.com/BOINC/boinc/issues/4629#issuecomment-1032504251, or unsubscribe https://github.com/notifications/unsubscribe-auth/AXVFKRCTYWEOTDMMXXN43DDU2D4X5ANCNFSM5NV5NFJA . Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.

You are receiving this because you were mentioned.Message ID: @.***>

CharlieFenton commented 2 years ago

@RichardHaselgrove Your output says with data directory "D:\", and you wrote:

Boinc has created:

But you didn't say where BOINC created these files. Is the root of the D drive "D:\" actually where BOINC is putting the files sdoutdae.txt, etc.? If so, that is also where it should be writing the coproc info file _coprocinfo.xml, so the obvious question is why it can write these other files but not that one.

RichardHaselgrove commented 2 years ago

In the root of D:

I was running the installer as an administrative user, but not as "the administrator". I'll try and explore why Windows has marked the files, and two of the four folders, with a 'padlock' symbol - implying some sort of security restriction.

CharlieFenton commented 2 years ago

@RichardHaselgrove The reason this issue caught my attention is that I recently encountered a situation, when running the Mac build under the Xcode debugger, where it was reporting no usable GPUs because the _coprocinfo.xml file had not been written. I traced that particular problem to a sigabort caused by a buffer overflow, which prompted my recent PR #4628. This is probably not due to the same cause, but clearly the file is not being written. I suppose it couldn't hurt to rebuild boinc.exe with that change and see if this problem still exists.

Unfortunately, my Windows PC died a while ago, so I have no way to investigate this specific issue on that platform. My recommendation is for someone who can run the BOINC client under the Windows debugger to set this situation up and run boinc.exe --detect_gpus --dir D:\ with a breakpoint in static void do_gpu_detection(int argc char** argv) in client/main.cpp and then step through the code to find the point of failure.

RichardHaselgrove commented 2 years ago

I've tried this on two other machines - one Windows 10 Pro, and another Windows 7 Pro SP1. Exactly the same failure and messages, Neither of the other two machines shows a padlock symbol, so I think that's unrelated. Nothing showed up on a quick Google document search.

I have a working Windows build system (VS2013, also VS2019), but I've never tried to run a build under the debugger. I'll try, but it may need some spare time.

AenBleidd commented 2 years ago

I'll check this today or tomorrow

RichardHaselgrove commented 2 years ago

Thanks, but I may have moved us on a stage. Trying to get the debugger to run (I've used it before, but in an older, simpler, version of Visual Studio and an older, simpler, programming language), I spotted that Charlie's command line was parsing the directory as D:\\ - that doesn't look good.

The registry entries for DATADIR and INSTALLDIR left behind by the installer have trailing backslashes, so I tried DATADIR=D: in the registry. That got me:

08/02/2022 16:43:25 |  | Data directory: D:\BOINC
08/02/2022 16:43:25 |  | Running under account Richard Haselgrove
08/02/2022 16:43:25 |  | [coproc] launching child process at D:\BOINC\boinc.exe
08/02/2022 16:43:25 |  | [coproc] with data directory "D:\BOINC"
08/02/2022 16:43:26 |  | CUDA: NVIDIA GPU 0: GeForce GTX 1050 Ti (driver version 442.74, CUDA version 10.2, compute capability 6.1, 4096MB, 3038MB available, 2138 GFLOPS peak)
08/02/2022 16:43:26 |  | OpenCL: NVIDIA GPU 0: GeForce GTX 1050 Ti (driver version 442.74, device version OpenCL 1.2 CUDA, 4096MB, 3038MB available, 2138 GFLOPS peak)
08/02/2022 16:43:26 |  | OpenCL: Intel GPU 0: Intel(R) HD Graphics 530 (driver version 21.20.16.5103, device version OpenCL 2.0, 1298MB, 1298MB available, 202 GFLOPS peak)
08/02/2022 16:43:26 |  | OpenCL CPU: Intel(R) Core(TM) i5-6500 CPU @ 3.20GHz (OpenCL driver vendor: Intel(R) Corporation, driver version 6.8.0.2, device version OpenCL 2.0 (Build 2))
08/02/2022 16:43:26 |  | [coproc] NVIDIA library reports 1 GPU
08/02/2022 16:43:26 |  | [coproc] No ATI library found.

Which is wrong, but differently wrong. The coproc detect has run - accurately - but data has been written into my chosen program directory. I'll leave someone else to work out what happens if the programs are in the default, protected, location on C:.