ezbz / gitlabber

Gitlabber - clones or pulls entire groups tree from gitlab
MIT License
472 stars 78 forks source link

--include could parse less groups/projects #68

Closed Crocmagnon closed 3 years ago

Crocmagnon commented 3 years ago

We have a somewhat large Gitlab instance with ~600 projects in various groups and subgroups. Currently, when running the following command, gitlabber still parses the whole tree my user has access to to generate its tree and it takes a long time. I feel like we could optimize the retrieval to parse only the specified branches in simple cases.

gitlabber -p -i "/mygroup/**" -a exclude

The progress bar clearly shows that all other groups/subgroups are parsed, even if they have no chance of matching this glob pattern.

I'd expect the previous command to only scan projects and subgroups inside mygroup.

Similarly, if multiple include flags are passed that can directly match a group, go through these groups only.

A glob that ends with a / and one or two stars and doesn't contain any other star or question mark should allow for this optimization.

IIRC Gitlab offers a "search" API that can return groups matching some criteria.

ezbz commented 3 years ago

@Crocmagnon, Gitlabber was written for an enterprise installation with >1200 projects and has some advanced options for dealing with large installations (see the less documented -c flag for concurrency and the -f for caching across runs, use these beta features at your own risk)

Given these optimizations I decided to forego implementing the top level filtering optimization. If you consider the aforementioned solutions and still think it is relevant feel free to submit a PR and I will consider it.

Crocmagnon commented 3 years ago

Thanks for your answer 🙂

The -c flag is only used for syncing the tree, which doesn't solve the original issue about tree fetching. The cache works fine but is indeed not documented.

For people passing by, here's how you can take advantage of it:

# First, generate the tree and save it
gitlabber -p --print-format yaml > cache.yaml

# On subsequent uses:
gitlabber -f cache.yaml .
# Note that Gitlabber won't refresh it automatically.

I'll see what I can do about proposing a merge request but I don't guarantee anything 😊