Today, proctor source leverages an in-memory filesystem for its operations around commit analysis. This will cause memory consumption (RSS) to grow significantly when accessing large code bases.
For example, consider the following command against the `kubernetes/kubernetes repo:
p source commits diff https://github.com/kubernetes/kubernetes --tag1 v1.25.0 --tag2 v1.25.1
Using sar, you can see the grows of 2 GC cycles. It's consuming 40+GB of memory:
Problem
Today,
proctor source
leverages an in-memory filesystem for its operations around commit analysis. This will cause memory consumption (RSS) to grow significantly when accessing large code bases.For example, consider the following command against the `kubernetes/kubernetes repo:
Using
sar
, you can see the grows of 2 GC cycles. It's consuming 40+GB of memory:It seems the go-git memory object is the primary consumer in the heap:
From a CPU perspective, we can see time is being spent in NewInMemoryRepo:
Proposed Solution
The right solution is likely to initial clone repositories to disk. This could all happen in $XDG_DATA_HOME/proctor/repos.
The primary change to our flow is that we'll need a new order of operations: