beeware / briefcase

Tools to support converting a Python project into a standalone native application.
https://briefcase.readthedocs.io/
BSD 3-Clause "New" or "Revised" License
2.47k stars 350 forks source link

Perform a shallow clone of templates on first use #1855

Closed freakboy3742 closed 3 weeks ago

freakboy3742 commented 4 weeks ago

Cookiecutter's default behavior is to perform a full clone of any repository passed as a URL. This is proving problematic for the macOS and Windows app templates, because the full history contains a lot of old binaries. Unfortunately, cookiecutter doesn't appear to have any setting to perform a shallow clone. However, it can operate on a local file path, and we already have tools to manage branches and remotes in place.

This PR makes 3 changes:

This is a partial fix for #933; as it reduces the size of the template clone that is required. The full fix is #1849, as that will only download the stub for the Python version that is in use.

Refs #933.

PR Checklist:

rmartin16 commented 3 weeks ago

A separate comment that may warrant independent effort.

While this strategy gets us away from depending on the ~/.cookiecutter directory, the ~/.cookiecutter_replay directory is still utilized. While we don't use replay functionality and I don't see a way to turn it off, we may want to consider using a custom location to avoid this noise in the user's home directory.

The only configuration I can find for this is via a file in the file system; for instance:

❯ cat ~/.cache/briefcase/templates/cookiecutter.yml 
replay_dir: "/home/russell/.cache/briefcase/templates/cookiecutter-replay"

This file path can be passed to cookiecutter() via config_file.

freakboy3742 commented 3 weeks ago

I like this idea but I'm not sure I understand how it's going to work currently.

If we do a shallow clone of the repo, then all Briefcase can know about the repo is the last commit of the default branch. Therefore, it will not be able to checkout the necessary branch for the version of Briefcase....or an arbitrary branch from the CLI.

Dang... I was afraid I might have missed something in my testing. Back to the drawing board, I guess...

mhsmith commented 3 weeks ago

If we do a shallow clone of the repo, then all Briefcase can know about the repo is the last commit of the default branch. Therefore, it will not be able to checkout the necessary branch for the version of Briefcase....or an arbitrary branch from the CLI.

In a shallow clone, I think that could be done with a git fetch command specifically asking for the version branch, and maybe a --depth argument to the fetch command as well.

But if we're about to remove the binaries from the templates anyway, it might not be worth the effort of getting a shallow clone working. When we remove the binaries, we can force-push the main branch to a commit which either has no history, or a rewritten history with the binaries omitted. After that, as long as we're careful not to add anything large to the templates, a full clone should be no problem.

mhsmith commented 3 weeks ago

We also noted just now that git clone has --branch and --single-branch options which are independent of the shallow setting.

freakboy3742 commented 3 weeks ago

In a shallow clone, I think that could be done with a git fetch command specifically asking for the version branch, and maybe a --depth argument to the fetch command as well. ... But if we're about to remove the binaries from the templates anyway, it might not be worth the effort of getting a shallow clone working. When we remove the binaries, we can force-push the main branch to a commit which either has no history, or a rewritten history with the binaries omitted. After that, as long as we're careful not to add anything large to the templates, a full clone should be no problem.

Even if we do this, the full repo history will include the blobs for all the historical branches - so the repo will still be ~500MB, even though only 1MB is actually checked out on any recent branches.

Another option may be a blobless-clone - this would download all the hashes (so there's full branch history), but only download the blobs when they're needed. I'll investigate further.

freakboy3742 commented 3 weeks ago

Looks like replacing depth=1 with filter=[blob:none] does the job without any additional handling. Using the current state of the repos, an initial macOS template clone is 59 MB (236MB without blob filtering). Switching to the v0.3.15 template increases the repo size to 61MB, with the extra blob content being downloaded when the branch is checked out.

I've also explicitly disabled replay usage on the cookiecutter calls; that should disable the use of the replay folder, without the need to write a configuration file.

rmartin16 commented 3 weeks ago

Looks like replacing depth=1 with filter=[blob:none] does the job without any additional handling. Using the current state of the repos, an initial macOS template clone is 59 MB (236MB without blob filtering). Switching to the v0.3.15 template increases the repo size to 61MB, with the extra blob content being downloaded when the branch is checked out

So, since we're still requiring a non-trivial amount of data to be downloaded and we've taken over the Git clone process from Cookiecutter, do we want to consider implementing a progress bar for GitPython? There's an example here. Before I read your comment fully that there's still 59MB to download....I was surprised when the process seemed to freeze for a minute in my VM.

I'll be able to do more robust testing of this strategy tomorrow.

freakboy3742 commented 3 weeks ago

So, since we're still requiring a non-trivial amount of data to be downloaded and we've taken over the Git clone process from Cookiecutter, do we want to consider implementing a progress bar for GitPython? There's an example here. Before I read your comment fully that there's still 59MB to download....I was surprised when the process seemed to freeze for a minute in my VM.

It's 59MB right now - but once #1849 lands, that should drop dramatically (hopefully to < 4MB, like the Xcode template).

That said, I'm not opposed to adding a progress bar. The only hesitation I have about leaning more heavily on GitPython specifically is that the project is currently marked as "maintenance mode only", so there's an outside chance we may need to replace that package at some point in the not-too-distant future. However, if it's low effort to implement, then we might as well lean on it while we can; if we need to fall back to a simple waitbar in the future, then so be it.

mhsmith commented 3 weeks ago

I'll leave this to @rmartin16 as he's more familiar with this area.

freakboy3742 commented 3 weeks ago

It's 59MB right now - but once #1849 lands, that should drop dramatically (hopefully to < 4MB, like the Xcode template).

All right...although, if the plan is to rewrite the currently large repos' histories, then this seems like it will mostly be a service for large third-party repos since the benefits will be marginal for Briefcase defaults....which lends credence towards Malcolm's comment about "why do this at all?". However, we have already done it at this point and the change doesn't seem that risky to me.

To be clear - the current plan is to not rewrite the repo histories, as the blobless clone means users won't have any large binary artefacts in their repo as soon as #1849 (and the related template changes) land.

Without this change, #1849 will mean the final checked out template is tiny... but you will still be cloning 200+MB of repo history. Rewriting repo history is one way to address that without this change, but given that we've got a clean solution that results in only downloading what you need, we might as well lean into that.

freakboy3742 commented 3 weeks ago

I've cleaned up those comments and error messages; #1849 has also been merged, and with the new "stables" repo, a clean macOS app template is 1.3 MB. Ready for final(?) review