flathub / org.freedesktop.Platform.GL.nvidia

42 stars 39 forks source link

Corrupted repo files? #224

Closed csc-games closed 5 months ago

csc-games commented 5 months ago

I was running flatpak update which this time included org.freedesktop.Platform.GL.nvidia-545-23-08. The output gave me this:

ID                                                      Branch           Op           Remote            Download
 1. [✗] org.freedesktop.Platform.GL.nvidia-545-23-08            1.4              i            flathub           4.4 GB / 4.4 GB

Warning: Error pulling from repo: Error reading from file: Bad address
Installation complete.

The error is isolated to this repo. Other updates pulled fine.

s1nka commented 5 months ago

same error

alastair87 commented 5 months ago

Same error. This is getting very frustrating now as 545.23.08 not being available for flatpak at all has prevented me installing the latest CUDA drivers on my Debian system for a while. So it's disappointing that it was finally up but now this issue instead.

Thank you for working on it. Hopefully it will be resolved soon!!

TingPing commented 5 months ago

No idea what would cause that, triggered a rebuild though: https://buildbot.flathub.org/#/builders/6/builds/93396

alastair87 commented 5 months ago

The same issue unfortunately:

~ 
▷ flatpak install https://dl.flathub.org/build-repo/76139/org.freedesktop.Platform.GL.nvidia-545-23-08.flatpakref

        ID                                                         Branch               Op              Remote                               Download
        ID                                                 Branch       Op       Remote                        Download
        ID                                                         Branch               Op              Remote                               Download
 1. [✗] org.freedesktop.Platform.GL.nvidia-545-23-08               1.4                  i               nvidia-545-23-08-origin              4.4 GB / 4.4 GB

Error: Error pulling from repo: Error reading from file: Bad address
error: Failed to install org.freedesktop.Platform.GL.nvidia-545-23-08: Error pulling from repo: Error reading from file: Bad address
~ 11m28s

I think it is likely to be to do with the fact that it embeds the CUDA toolkit and is so large (4.4GB):

https://github.com/flathub/org.freedesktop.Platform.GL.nvidia/commit/2c351ff8e050e2770dc16a74615b8aaef6e8712f

For reference I'm using flatpak 1.14.4 on Debian 12.

csc-games commented 5 months ago

@TingPing I'm still getting the same error.

alastair87 commented 5 months ago

I tried it in an Arch Linux container to see if the installer issue might be related to my version of flatpak (I realise this would not install it on the host). It also fails but I get a slightly different error.

~ 
▷ lxc shell more-quetzal
[root@more-quetzal ~]# flatpak install nvidia-545
Looking for matches…
Similar refs found for ‘nvidia-545’ in remote ‘flathub’ (system):

   1) runtime/org.freedesktop.Platform.GL32.nvidia-545-23-06/x86_64/1.4
   2) runtime/org.freedesktop.Platform.GL32.nvidia-545-23-08/x86_64/1.4
   3) runtime/org.freedesktop.Platform.GL.nvidia-545-29-02/x86_64/1.4
   4) runtime/org.freedesktop.Platform.GL.nvidia-545-29-06/x86_64/1.4
   5) runtime/org.freedesktop.Platform.GL32.nvidia-545-29-02/x86_64/1.4
   6) runtime/org.freedesktop.Platform.GL32.nvidia-545-29-06/x86_64/1.4
   7) runtime/org.freedesktop.Platform.GL.nvidia-545-23-06/x86_64/1.4
   8) runtime/org.freedesktop.Platform.GL.nvidia-545-23-08/x86_64/1.4

Which do you want to use (0 to abort)? [0-8]: 8

        ID                                           Branch Op Remote  Download
 1. [|] org.freedesktop.Platform.GL.nvidia-545-23-08 1.4    i  flathub 4.4 GB /         ID                                           Branch Op Remote  Download
 1. [✗] org.freedesktop.Platform.GL.nvidia-545-23-08 1.4    i  flathub 4.4 GB / 4.4 GB

Error: While trying to apply extra data: apply_extra script failed, exit status 256
error: Failed to install org.freedesktop.Platform.GL.nvidia-545-23-08: While trying to apply extra data: apply_extra script failed, exit status 256
[root@more-quetzal ~]# 

Arch is using flatpak 115.6.

alastair87 commented 5 months ago

I believe I've found the root of the problem, as I thought it might be, which is that the CUDA version of the installer has an additional prompt to accept the CUDA toolkit license agreement, not present in the normal driver. Presumably because the installer appears to stall flatpak errors out.

I downloaded the installer files for 545.29.02 and 545.23.08 manually.

When I do sudo bash cuda_12.3.1_545.23.08_linux.run I get

┌──────────────────────────────────────────────────────────────────────────────┐
│  End User License Agreement                                                  │
│  --------------------------                                                  │
│                                                                              │
│  NVIDIA Software License Agreement and CUDA Supplement to                    │
│  Software License Agreement. Last updated: October 8, 2021                   │
│                                                                              │
│  The CUDA Toolkit End User License Agreement applies to the                  │
│  NVIDIA CUDA Toolkit, the NVIDIA CUDA Samples, the NVIDIA                    │
│  Display Driver, NVIDIA Nsight tools (Visual Studio Edition),                │
│  and the associated documentation on CUDA APIs, programming                  │
│  model and development tools. If you do not agree with the                   │
│  terms and conditions of the license agreement, then do not                  │
│  download or use the software.                                               │
│                                                                              │
│  Last updated: October 8, 2021.                                              │
│                                                                              │
│                                                                              │
│  Preface                                                                     │
│  -------                                                                     │
│                                                                              │
│──────────────────────────────────────────────────────────────────────────────│
│ Do you accept the above EULA? (accept/decline/quit):                         │
│                                                                              │
└──────────────────────────────────────────────────────────────────────────────┘

Whereas for sudo bash NVIDIA-Linux-x86_64-545.29.02.run I get taken to the main body of the installer (which warns me to stop, I have the Debian packages installed so this error shouldn't apply building the flatpak):

                       NVIDIA Accelerated Graphics Driver for Linux-x86_64 (545.29.02)

  WARNING: An NVIDIA kernel module 'nvidia-drm' appears to be already loaded in your kernel.  This may be    
           because it is in use (for example, by an X server, a CUDA program, or the NVIDIA Persistence      
           Daemon), but this may also happen if your kernel was configured without support for module        
           unloading.  Some of the sanity checks that nvidia-installer performs to detect potential          
           installation problems are not possible while an NVIDIA kernel module is running.

                                                      OK  

  NVIDIA Software Installer for Unix/Linux    

There is a --silent flag for the CUDA toolkit driver but I'm not sure how I'd go about modifying the flatpak source to invoke that, but hopefully that would be a straightforward fix,.

See also screenshots of above:

image

image

Thanks again for your help with this.

alastair87 commented 5 months ago

@TingPing

I also noticed --driver flag, this doesn't unfortunately disable the EULA prompt but hopefully it might make it possible to not include the CUDA Toolkit files in the final output, even if they have to be downloaded.

alastair87 commented 5 months ago

Looking at this further, I think the issue may be simpler than that, see here:

https://github.com/flathub/org.freedesktop.Platform.GL.nvidia/blob/2c351ff8e050e2770dc16a74615b8aaef6e8712f/nvidia-apply-extra.c#L190

It looks like the apply-extra function is hard-coded to expect a particular name structure for the driver installer, which the CUDA installer doesn't follow.

TingPing commented 5 months ago

CC @aloisklink

aloisklink commented 5 months ago

I've also got no idea what might be causing this issue, although it's happening to me too. When I tried running the code in the ./build.sh script manually on my PC locally, it worked for me though, so maybe it's something to do with the CUDA download being over 4GiB in size and that's hitting a 32-bit limit? I don't know much about Flatpak however.

It looks like the apply-extra function is hard-coded to expect a particular name structure for the driver installer, which the CUDA installer doesn't follow.

That's how my fix in PR https://github.com/flathub/org.freedesktop.Platform.GL.nvidia/pull/214 was supposed to work. NVIDIA does not host this driver installer anywhere online as a simple download. In order to get it, you have to first download the 4 GiB CUDA .run file, then the normal ~300 MiB driver installer is embedded within this CUDA .run file.


@TingPing, is there any chance we can just upload the NVIDIA-Linux-x86_64-545.23.08.run file somewhere, either on a server owned by Flatpak, or maybe as a GitHub Release on this repo?

NVIDIA allows us to redistribute the Linux SDK as long as the only modifications we make are uncompressing files (see EULA§2.3), and GitHub Releases seems like they allow unlimited downloads.

That way we can revert PR https://github.com/flathub/org.freedesktop.Platform.GL.nvidia/pull/214, and all we'd need to do is make a:

It means we wouldn't have to change anything else.

You (or another Flatpak admin) would have to download:

DavidMacAneutronic commented 5 months ago

full return, using flatpak update -y -v --noninteractive F: marking op install/update:runtime/org.freedesktop.Platform.GL.nvidia-545-23-08/x86_64/1.4 resolved to f272c9d68bdf57d108b4bb3da7c7fc8eb85dcce10b39a9f40628407f0722aa80 Installing runtime/org.freedesktop.Platform.GL.nvidia-545-23-08/x86_64/1.4 F: Calling system helper: GetRevokefsFd F: Calling system helper: GetRevokefsFd F: flatpak_dir_pull: Using commit f272c9d68bdf57d108b4bb3da7c7fc8eb85dcce10b39a9f40628407f0722aa80 for pull of ref runtime/org.freedesktop.Platform.GL.nvidia-545-23-08/x86_64/1.4 from remote flathub F: Loading https://developer.download.nvidia.com/compute/cuda/12.3.1/local_installers/cuda_12.3.1_545.23.08_linux.run using libsoup

F: Received 4368526618 bytes F: Calling system helper: Deploy Warning: Failed to install org.freedesktop.Platform.GL.nvidia-545-23-08: Error pulling from repo: Error reading from file: Bad address F: flathub:x86_64 appstream age 1927 is less than ttl 86400

TingPing commented 5 months ago

@aloisklink I don't love it, but here it is. If you could update the data and test it would be appreciated: https://github.com/flathub/org.freedesktop.Platform.GL.nvidia/releases/tag/cuda

aloisklink commented 5 months ago

@aloisklink I don't love it, but here it is. If you could update the data and test it would be appreciated: https://github.com/flathub/org.freedesktop.Platform.GL.nvidia/releases/tag/cuda

Thanks for the help @TingPing, especially since I know you're no longer using an NVIDIA GPU!

I've made a PR at #226! I tested it locally and it worked, but then again, my previous PR also worked when I tested locally.

I agree, hosting the .run files ourselves is a bit of hack and although GitHub currently gives unlimited bandwidth for GitHub Releases, I wouldn't be surprised if they add a limit in a few years.

167 will be the long-term fix. If I ever get a free weekend, I'll have a look at it and see if I can figure out a way to fix it.

caminashell commented 4 months ago

I got the following error whilst updating via Discover, and now when I refresh updates, there are none. I think it was a release refresh. But the driver version I currently using is 545.23.08 anyway.

Aborted due to failure (While downloading https://github.com/flathub/org.freedesktop.Platform.GL.nvidia/releases/download/cuda/NVIDIA-Linux-x86_64-545.23.08.run: While fetching https://github.com/flathub/org.freedesktop.Platform.GL.nvidia/releases/download/cuda/NVIDIA-Linux-x86_64-545.23.08.run: [56] Failure when receiving data from the peer)

I wanted to get some advice on how to proceed after such an event, as right now my default is to leave things as they are until an update is prompted.

Thank you kindly.

rambo919 commented 3 months ago

It's back again, probably because 550-54-14 was updated to 550-54-15 the latter of which is nowhere in flatpak. Aborted due to failure (While pulling runtime/org.freedesktop.Platform.GL.default/x86_64/22.08-extra from remote flathub: opcode set-read-source: Opening content object 562c552189780f3b6aaf71e2d249696d5de81a3e8cfb1b27516c79c812622354: Opening content object 562c552189780f3b6aaf71e2d249696d5de81a3e8cfb1b27516c79c812622354: Couldn't find file object '562c552189780f3b6aaf71e2d249696d5de81a3e8cfb1b27516c79c812622354')