RDTK / generator

A tool for creating Jenkins jobs and other things from recipes describing software projects
GNU General Public License v3.0
21 stars 3 forks source link

use shallow clone for analysis #38

Open rhaschke opened 5 years ago

rhaschke commented 5 years ago

Is it possible to use git clone --depth n to just clone the recent history of the requested branch? Some repos, e.g. OpenCV, have a huge history and take ages to download fully.

scymtym commented 5 years ago

Thank you for the suggestion. I'm not completely sure what you mean, though. Which context are we talking about, the analyze command or analysis in other commands? Is this suggestion for cloning from the cache (that should already use a limited depth), for creating the cache entry, or for working without the cache?

rhaschke commented 5 years ago

I was aiming for the analyze command, creating the cache entry and/or working w/o the cache.

scymtym commented 5 years ago

I did a few experiments.

  1. The analyze command should usually work with --depth 1 (and doesn't use the cache).

  2. Cloning a specific from the cache for analysis can usually use --depth 1.

  3. Two possibilities for cache entry creation:

    1. Create the cache entry with

      git clone --bare --depth 1 URL                                    # shallow bare clone, basically empty
      git config --add remote.origin.fetch '+refs/heads/*:refs/heads/*' # should fetch branches
      git config --add remote.origin.fetch '+refs/tags/*:refs/tags/*'   # should fetch tags
      git fetch --depth 1                                               # shallow fetch
    2. Like 3.1 but only fetch branches, tags and commits that are actually needed. This requires collecting the referenced branches, tags and commits for a given project and updating what the cache entry should fetch when new references are needed.

The improvements 1. and 2. can be implemented easily but will typically not gain us a lot.

3.1 is relatively effective (300 MB instead of more than 1 GB for opencv) but doesn't work for all cases: directly specified commits as well unusual ref names will not be present in the cache entry.

3.2 is very effective and also correct but complicated so implementing it may not be worth the trouble.

rhaschke commented 5 years ago

Regarding 3.2: Don't you know the required refs anyway? You could even fetch individual refs on demand only, i.e. only fetch the ones that you actually need for the present analysis. Usually, within a distribution, we don't often switch between different versions, do we?