use shallow clone for analysis

rhaschke commented 5 years ago

Is it possible to use git clone --depth n to just clone the recent history of the requested branch? Some repos, e.g. OpenCV, have a huge history and take ages to download fully.

scymtym commented 5 years ago

Thank you for the suggestion. I'm not completely sure what you mean, though. Which context are we talking about, the analyze command or analysis in other commands? Is this suggestion for cloning from the cache (that should already use a limited depth), for creating the cache entry, or for working without the cache?

rhaschke commented 5 years ago

I was aiming for the analyze command, creating the cache entry and/or working w/o the cache.

scymtym commented 5 years ago

I did a few experiments.

The analyze command should usually work with --depth 1 (and doesn't use the cache).
Cloning a specific from the cache for analysis can usually use --depth 1.

Two possibilities for cache entry creation:

Create the cache entry with

git clone --bare --depth 1 URL                                    # shallow bare clone, basically empty
git config --add remote.origin.fetch '+refs/heads/*:refs/heads/*' # should fetch branches
git config --add remote.origin.fetch '+refs/tags/*:refs/tags/*'   # should fetch tags
git fetch --depth 1                                               # shallow fetch

Like 3.1 but only fetch branches, tags and commits that are actually needed. This requires collecting the referenced branches, tags and commits for a given project and updating what the cache entry should fetch when new references are needed.

The improvements 1. and 2. can be implemented easily but will typically not gain us a lot.

3.1 is relatively effective (300 MB instead of more than 1 GB for opencv) but doesn't work for all cases: directly specified commits as well unusual ref names will not be present in the cache entry.

3.2 is very effective and also correct but complicated so implementing it may not be worth the trouble.

rhaschke commented 5 years ago

Regarding 3.2: Don't you know the required refs anyway? You could even fetch individual refs on demand only, i.e. only fetch the ones that you actually need for the present analysis. Usually, within a distribution, we don't often switch between different versions, do we?

RDTK / generator

use shallow clone for analysis #38