Open seh opened 8 years ago
Yes, this is something I need to fix. It's not just inefficient, it's actually wrong to cherry pick parts of a repo.
On Fri, Sep 23, 2016 at 5:32 AM, Steven E. Harris notifications@github.com wrote:
When one runs gb vendor fetch , gb calls main.fetch https://github.com/constabulary/gb/blob/master/cmd/gb-vendor/fetch.go#L84 to acquire https://github.com/constabulary/gb/blob/master/cmd/gb-vendor/fetch.go#L103, copy a portion of https://github.com/constabulary/gb/blob/master/cmd/gb-vendor/fetch.go#L134, and then discard its copy of the remote repository https://github.com/constabulary/gb/blob/master/cmd/gb-vendor/fetch.go#L142. After that, so long as its -no-recurse flag https://github.com/constabulary/gb/blob/master/cmd/gb-vendor/fetch.go#L40 is false, it proceeds to fetch the missing transitive dependencies of the source it's acquired thus far https://github.com/constabulary/gb/blob/master/cmd/gb-vendor/fetch.go#L195 .
The problem arises when one requests fetching of one import path from a repository that yields files that in turn import alternate paths within that same repository. Consider a hypothetical repository:
- example.com/org/repo/.git http://example.com/org/repo/.git
- example.com/org/repo/p1 http://example.com/org/repo/p1
- file1.go
package p1 import "example.com/org/repo/p2" var P p2.Something
- example.com/org/repo/p2 http://example.com/org/repo/p2
- file2.go
package p2 type Something string
If one runs
gb vendor fetch example.com/org/repo/p1
then gb will fetch the repository example.com/org/repo http://example.com/org/repo, copy the p1 path within it, then proceed to fetch the same repository again, then copy the p2 path within it.
This doesn't matter much for small repositories, but for large ones it can take many hours, wasting bandwidth and churning the disk unnecessarily. Consider augmenting main.fetch to remember the set of repositories it's downloaded from its initial top-level invocation, and to destroy them all only when unwinding back up to the top-level. Intermediate recursive invocations could share that repository cache to avoid downloading the same repository more than once.
— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/constabulary/gb/issues/645, or mute the thread https://github.com/notifications/unsubscribe-auth/AAAcAzh_fdDiGNKSxgQBm2tjPN3pAGd_ks5qste2gaJpZM4KERkt .
When one runs gb vendor fetch , gb calls
main.fetch
to acquire, copy a portion of, and then discard its copy of the remote repository. After that, so long as its-no-recurse
flag is false, it proceeds to fetch the missing transitive dependencies of the source it's acquired thus far.The problem arises when one requests fetching of one import path from a repository that yields files that in turn import alternate paths within that same repository. Consider a hypothetical repository:
If one runs
then gb will fetch the repository example.com/org/repo, copy the p1 path within it, then proceed to fetch the same repository again, then copy the p2 path within it.
This doesn't matter much for small repositories, but for large ones it can take many hours, wasting bandwidth and churning the disk unnecessarily. Consider augmenting
main.fetch
to remember the set of repositories it's downloaded from its initial top-level invocation, and to destroy them all only when unwinding back up to the top-level. Intermediate recursive invocations could share that repository cache to avoid downloading the same repository more than once.