biancadanforth / taskcluster-integration-poc

A proof-of-concept to push to the Try server from GitHub via a Taskcluster job for off-train experiments at Mozilla
Mozilla Public License 2.0
0 stars 0 forks source link

Optimize Firefox hg clone #6

Open biancadanforth opened 5 years ago

biancadanforth commented 5 years ago

As of PR #5 , we are performing a full hg clone of the Firefox mozilla-central (Nightly) repo. This takes 3-4 minutes for each task, and is not necessary, as we do not require the full commit history for the repo for pushing the extension to the Try server, our ultimate goal.

It doesn't look like hg natively supports shallow cloning yet, though git does (using git-cinnabar to clone Firefox is not ideal).

Likely the best option is to cache our full hg clone using Taskcluster in our config file, .taskcluster.yml, so that subsequently we can reuse the repo and perform hg updates, which is much faster. TravisCI provides a way to do this, and Taskcluster has some limited documentation. Owlish also linked me a v0 config example with caching, though it's unclear to me exactly how the cache is initialized and configured (re: expiration, etc.).

biancadanforth commented 5 years ago

I asked sheehan in #vcs in IRC (a Mercurial engineer for Mozilla) what the options are on the Mercurial side:

Note: remotefilelog can't be used in the meantime, as it only works for ssh-based repos (not https-based, which is what Mozilla's are).

biancadanforth commented 5 years ago

Note: It is possible to have a corrupted clonebundle for mozilla-central (or any hg repo), indicated by the following error:

`abort: unexpected response from remote server: empty string` error just as cloning process is finishing).

Note: hg clone... --verbose --traceback gives an even more detailed error message

As a temporary but slower workaround, mozilla-unified can be cloned instead. Given that this is an even larger repo (a monorepo that includes central along with beta, release, ...), it is increasingly important that I make use of caching in Taskcluster or otherwise find a way to optimize the clone.

Per sheehan on corrupt clonebundles:

Corrupt clonebundles are pretty rare, and he recommends using mozilla-unified.

"On firefox workers there are caches of repositories using "shared storage", where you can have a single clone of the repo and create multiple checkouts of that clone on disk."

biancadanforth commented 5 years ago

I just spoke with jlaster on the DevTools' Debugger team, and his team has tried caching mozilla-central in the past in CI (Travis or Circle), and it was actually slower for them than doing a full m-c clone. It turns out hg is optimized for cloning.

Since their CI is hosted on AWS just like mozilla-central's hg repo, he couldn't beat a time of 3-4 minutes. Taskcluster is also hosted on AWS, so I think the right answer here is to wait for shallow hg clone support, as we are not going to be able to do better with caching.