flathub / com.valvesoftware.Steam.CompatibilityTool.Proton

https://flathub.org/apps/details/com.valvesoftware.Steam.CompatibilityTool.Proton
35 stars 6 forks source link

Speed up sources download #13

Open gasinvein opened 3 years ago

gasinvein commented 3 years ago

Proton is a git repository with numerous git modules, some of which are huge (namely wine and gstreamer). And we have many flatpak-builder modules (two for each Proton component).

flatpak-builder fetches the whole Proton repo with all git submodules for each module, what results in heavy I/O and incredibly long download/checkout times (in fact, on Flathub checkouts take even more time than actual compilation).

We should do something about it. The only solution I see is splitting single git source into multiple archive sources. Does anyone has other ideas?

fabianhjr commented 3 years ago

It might be possible to make shallow clones of submodules with --shallow-submodules. ( https://www.git-scm.com/docs/git-clone#Documentation/git-clone.txt---no-shallow-submodules )

It could be an upstream enhancement (relevant lines):

Though I am new to flatpak.

gasinvein commented 3 years ago

flatpak-builder should make shallow-clones by default when possible (there is an option to disable it explicitly). Do you think it might be not using it for submodules?

fabianhjr commented 3 years ago

I think it isn't but I a not familiar with the codebase and the shallow submodules are an extra setting from --depth main repo shallow fetching.

gasinvein commented 3 years ago

As far a I understand, flatpak-builder doesn't clone git repos with submodules, but instead extracts submodules list from the main repo and mirrors each submodule individualy. So any recursion options should be irrelevant in this case. Yet I can be wrong.

gasinvein commented 3 years ago

@barthalion My local tests suggest that running flatpak-builder with --disable-updates almost completely removes the issue. I'm guessing the flathub's build bot doesn't use this option? If so - maybe it should use it, given that it downloads sources prior to starting the build?

barthalion commented 3 years ago

Not really sure. Sources are pre-downloaded, but build machines have also local cache – what happens if requested commit is not available in the local clone?

gasinvein commented 3 years ago

I'm not sure what the local cache is in this context. Aren't sources are downloaded anew on each build? If so, how requested commit could be unavailable? I mean, if we run something like flatpak-builder --download-only, and then flatpak-builder --disable-updates, everything should be in place?

barthalion commented 3 years ago

I've looked at this again and yes, sources are being downloaded as a separate step but on a mirror node, not runners. So passing --disable-updates will just cause f-b to fail due to missing source code on actual builders.

gasinvein commented 2 years ago

This is getting worse over time as new components are being added to Proton (increasing the modules number in this flatpak). Basically we do git fetch m*s times, where m is the number of flatpak-builder modules and s is the number of git submodules in the source repo, so each addition to either increases build times significantly.

@barthalion Can we run f-b --download-only followed by f-b --disable-updates on runners as the build step?

barthalion commented 2 years ago

I know we talked about it, but I still fail to understand what exactly --download-only source would solve here. We no longer have sources worker, and so the only "build command" that is executed is this:

            command = ['flatpak-builder', '-v', '--force-clean', '--sandbox', '--delete-build-dirs',
                       '--user', fb_deps_args,
                       util.Property('extra_fb_args'),
                       '--mirror-screenshots-url=https://dl.flathub.org/repo/screenshots', '--repo', 'repo',
                       util.Interpolate('--extra-sources=%(prop:builddir)s/../downloads'),
                       '--default-branch', util.Property('flathub_default_branch'),
                       '--subject', util.Property('flathub_subject'),
                       '--add-tag=upstream-maintained' if builds.is_upstream_maintained(id) else '--remove-tag=upstream-maintained',
                       'builddir', util.Interpolate('%(prop:flathub_manifest)s')]

How is --download-only in a separate step going to help?

gasinvein commented 2 years ago

--download-only by itself isn't going to help, it's --disable-updates what makes difference here. If the build is ran with --disable-updates, flatpak-builder skips fetching git sources from remotes and just copies whatever is already cached.

barthalion commented 2 years ago

But it's still going to take a significant amount of time to execute --download-only, doesn't it?

gasinvein commented 2 years ago

Yeah, just re-checked that and it seems like it. So, my proposal to run --download-only followed by --disable-updates probably doesn't make sense.

gasinvein commented 2 years ago

But still, maybe we could run builds with --disable-updates on Flathub? If it's not an option to enable it for all builds, maybe it could be gated by some flathub.json option?