Here is the range-diff from the equivalent patches in the previous version:
```
1: 3fdb57edbc5 ! 1: 6365d8148c4 pack-objects: extract should_attempt_deltas()
@@ Metadata
## Commit message ##
pack-objects: extract should_attempt_deltas()
- This will be helpful in a future change that introduces a new way to
- compute deltas.
-
- Be careful to preserve the nr_deltas counting logic in the existing
- method, but take the rest of the logic wholesale.
+ This will be helpful in a future change.
Signed-off-by: Derrick Stolee
2: a0475c7cba8 = 2: 0a4c2c2da0e pack-objects: add --path-walk option
3: 73c8b61e87b = 3: df74fe9a35b pack-objects: update usage to match docs
4: 21dc3723c36 ! 4: 4e08947ea0a p5313: add performance tests for --path-walk
@@ Commit message
Running on my copy of the Git repository results in this data:
- Test this tree
- ---------------------------------------------------------
- 5313.2: thin pack 0.01(0.00+0.00)
- 5313.3: thin pack size 1.1K
- 5313.4: thin pack with --path-walk 0.01(0.01+0.00)
- 5313.5: thin pack size with --path-walk 1.1K
- 5313.6: big pack 2.52(6.59+0.38)
- 5313.7: big pack size 14.1M
- 5313.8: big pack with --path-walk 4.90(5.76+0.26)
+ Test HEAD
+ --------------------------------------------------------------
+ 5313.2: thin pack 0.00(0.00+0.00)
+ 5313.3: thin pack size 589
+ 5313.4: thin pack with --path-walk 0.00(0.00+0.00)
+ 5313.5: thin pack size with --path-walk 589
+ 5313.6: big pack 2.76(7.19+0.27)
+ 5313.7: big pack size 14.0M
+ 5313.8: big pack with --path-walk 5.76(6.72+0.16)
5313.9: big pack size with --path-walk 13.2M
Note that the timing is slower because there is no threading in the
@@ Commit message
Running the tests on this repo results in the following output:
- Test this tree
- ----------------------------------------------------------
- 5313.2: thin pack 0.28(0.38+0.02)
+ Test HEAD
+ --------------------------------------------------------------
+ 5313.2: thin pack 0.28(0.40+0.03)
5313.3: thin pack size 1.2M
- 5313.4: thin pack with --path-walk 0.08(0.06+0.01)
+ 5313.4: thin pack with --path-walk 0.07(0.06+0.00)
5313.5: thin pack size with --path-walk 18.4K
- 5313.6: big pack 4.05(29.62+0.43)
- 5313.7: big pack size 20.0M
- 5313.8: big pack with --path-walk 5.99(9.06+0.24)
- 5313.9: big pack size with --path-walk 16.4M
+ 5313.6: big pack 4.05(29.48+0.41)
+ 5313.7: big pack size 19.7M
+ 5313.8: big pack with --path-walk 6.01(9.17+0.20)
+ 5313.9: big pack size with --path-walk 16.5M
Notice in particular that in the small thin pack, the time performance
has improved from 0.28s to 0.08s and this is likely due to the improved
@@ Commit message
Finally, running this on a copy of the Linux kernel repository results
in these data points:
- Test this tree
- -----------------------------------------------------------
- 5313.2: thin pack 0.00(0.00+0.00)
- 5313.3: thin pack size 5.8K
- 5313.4: thin pack with --path-walk 0.00(0.01+0.00)
- 5313.5: thin pack size with --path-walk 5.8K
- 5313.6: big pack 24.39(65.81+1.31)
- 5313.7: big pack size 155.7M
- 5313.8: big pack with --path-walk 41.07(60.69+0.68)
- 5313.9: big pack size with --path-walk 150.8M
+ Test HEAD
+ --------------------------------------------------------------
+ 5313.2: thin pack 0.03(0.02+0.00)
+ 5313.3: thin pack size 4.6K
+ 5313.4: thin pack with --path-walk 0.03(0.01+0.01)
+ 5313.5: thin pack size with --path-walk 4.6K
+ 5313.6: big pack 21.06(60.57+1.45)
+ 5313.7: big pack size 158.3M
+ 5313.8: big pack with --path-walk 37.65(57.83+0.67)
+ 5313.9: big pack size with --path-walk 152.3M
Signed-off-by: Derrick Stolee
5: 6f96b1c227a = 5: 9235f9bb9c8 pack-objects: introduce GIT_TEST_PACK_PATH_WALK
-: ----------- > 6: 201b2210712 t5538: add tests to confirm deltas in shallow pushes
6: 834c9ea2709 ! 7: 5caf28ec6c7 repack: add --path-walk option
@@ Commit message
through in 'git repack'. This presents interesting testing opportunities for
comparing the different repacking strategies against each other.
- In my copy of the Git repository, the new tests in p5313 show these
- results:
+ Add the --path-walk option to the performance tests in p5313.
+
+ For the microsoft/fluentui repo [1] checked out at a specific commit [2],
+ the results are very interesting:
+
+ Test HEAD
+ ----------------------------------------------------------
+ 5313.2: thin pack 0.41(0.48+0.03)
+ 5313.3: thin pack size 1.2M
+ 5313.4: thin pack with --path-walk 0.08(0.05+0.02)
+ 5313.5: thin pack size with --path-walk 18.4K
+ 5313.6: big pack 4.47(30.62+0.40)
+ 5313.7: big pack size 19.6M
+ 5313.8: big pack with --path-walk 6.76(9.87+0.23)
+ 5313.9: big pack size with --path-walk 16.5M
+ 5313.10: repack 96.87(664.29+2.75)
+ 5313.11: repack size 439.5M
+ 5313.12: repack with --path-walk 95.68(109.90+0.92)
+ 5313.13: repack size with --path-walk 122.6M
- Test this tree
- -------------------------------------------------------------
- 5313.10: repack 27.88(150.23+2.70)
- 5313.11: repack size 228.2M
- 5313.12: repack with --path-walk 134.59(148.77+0.81)
- 5313.13: repack size with --path-walk 209.7M
+ [1] https://github.com/microsoft/fluentui
+ [2] e70848ebac1cd720875bccaa3026f4a9ed700e08
- Note that the 'git pack-objects --path-walk' feature is not integrated
- with threads. Look forward to a future change that will introduce
- threading to improve the time performance of this feature with
- equivalent space performance.
+ This repo suffers from having a lot of paths that collide in the name
+ hash, so examining them in groups by path leads to better deltas. Also,
+ in this case, the single-threaded implementation is competitive with the
+ full repack. This is saving time diffing files that have significant
+ differences from each other.
- For the microsoft/fluentui repo [1] had some interesting aspects for the
- previous tests in p5313, so here are the repack results:
+ A similar, but private, repo has even more extremes during repacking:
- Test this tree
- -------------------------------------------------------------
- 5313.10: repack 91.76(680.94+2.48)
- 5313.11: repack size 439.1M
- 5313.12: repack with --path-walk 110.35(130.46+0.74)
- 5313.13: repack size with --path-walk 155.3M
+ Test HEAD
+ -----------------------------------------------------------------
+ 5313.10: repack 2138.22(11961.00+17.67)
+ 5313.11: repack size 6.4G
+ 5313.12: repack with --path-walk 1351.46(1418.28+3.96)
+ 5313.13: repack size with --path-walk 804.1M
- [1] https://github.com/microsoft/fluentui
+ There are small benefits in size for my copy of the Git repository:
- Here, we see the significant improvement of a full repack using this
- strategy. The name-hash collisions in this repo cause the space
- problems. Those collisions also cause the repack command to spend a lot
- of cycles trying to find delta bases among files that are not actually
- very similar, so the lack of threading with the --path-walk feature is
- less pronounced in the process time.
+ Test HEAD
+ -----------------------------------------------------------
+ 5313.10: repack 22.11(98.37+1.64)
+ 5313.11: repack size 126.4M
+ 5313.12: repack with --path-walk 66.89(75.61+0.58)
+ 5313.13: repack size with --path-walk 109.6M
- For the Linux kernel repository, we have these stats:
+ As well as in the nodejs/node repository [3]:
- Test this tree
+ Test HEAD
+ -------------------------------------------------------------
+ 5313.2: thin pack 0.01(0.01+0.00)
+ 5313.3: thin pack size 1.6K
+ 5313.4: thin pack with --path-walk 0.02(0.01+0.00)
+ 5313.5: thin pack size with --path-walk 1.6K
+ 5313.6: big pack 5.35(12.43+0.32)
+ 5313.7: big pack size 52.2M
+ 5313.8: big pack with --path-walk 7.12(11.97+0.27)
+ 5313.9: big pack size with --path-walk 52.1M
+ 5313.10: repack 87.74(342.90+4.24)
+ 5313.11: repack size 739.7M
+ 5313.12: repack with --path-walk 212.79(245.05+1.78)
+ 5313.13: repack size with --path-walk 697.6M
+
+ [3] https://github.com/nodejs/node
+
+ This benefit also repeats in this instance in the Linux kernel repository:
+
+ Test HEAD
---------------------------------------------------------------
- 5313.10: repack 553.61(1929.41+30.31)
+ 5313.2: thin pack 0.04(0.00+0.03)
+ 5313.3: thin pack size 4.6K
+ 5313.4: thin pack with --path-walk 0.03(0.01+0.01)
+ 5313.5: thin pack size with --path-walk 4.6K
+ 5313.6: big pack 21.16(62.81+1.35)
+ 5313.7: big pack size 158.3M
+ 5313.8: big pack with --path-walk 36.09(55.25+0.67)
+ 5313.9: big pack size with --path-walk 152.2M
+ 5313.10: repack 734.26(2149.62+31.24)
5313.11: repack size 2.5G
- 5313.12: repack with --path-walk 1777.63(2044.16+7.47)
- 5313.13: repack size with --path-walk 2.5G
-
- This demonstrates that the --path-walk feature does not always present
- measurable improvements, especially in cases where the name-hash has
- very few collisions.
+ 5313.12: repack with --path-walk 1457.23(1618.15+7.00)
+ 5313.13: repack size with --path-walk 2.2G
+
+ It is important to see that even when the repository shape does not have
+ many name-hash collisions, there is a slight space boost to be found
+ using this method. Also, there is no known case where the space is
+ worse with --path-walk. This is of course due to the second pass where
+ all objects to be packed are sorted in the usual way and checked for
+ deltas. This second pass is usually very fast as the path-walk has
+ primed many objects with quality deltas that short-circuit other delta
+ computation attempts.
Signed-off-by: Derrick Stolee
@@ builtin/repack.c: static int run_update_server_info = 1;
- N_("git repack []"),
+ N_("git repack [-a] [-A] [-d] [-f] [-F] [-l] [-n] [-q] [-b] [-m]\n"
+ "[--window=] [--depth=] [--threads=] [--keep-pack=]\n"
-+ "[--write-midx] [--full-path-walk]"),
++ "[--write-midx] [--path-walk]"),
NULL
};
7: 6ef8d67af4b ! 8: 8bfe5116178 repack: update usage to match docs
@@ Commit message
Signed-off-by: Derrick Stolee
- ## builtin/repack.c ##
-@@ builtin/repack.c: static char *packdir, *packtmp_name, *packtmp;
- static const char *const git_repack_usage[] = {
- N_("git repack [-a] [-A] [-d] [-f] [-F] [-l] [-n] [-q] [-b] [-m]\n"
- "[--window=] [--depth=] [--threads=] [--keep-pack=]\n"
-- "[--write-midx] [--full-path-walk]"),
-+ "[--write-midx] [--path-walk]"),
- NULL
- };
-
-
## t/t0450/txt-help-mismatches ##
@@ t/t0450/txt-help-mismatches: rebase
remote
8: 1db90e361ba = 9: 6b9ca51bc7c pack-objects: enable --path-walk via config
9: 0f3040b4b90 = 10: 3f05c89b9cc scalar: enable path-walk during push via config
10: 030d8ec238e = 11: 37572d70deb pack-objects: refactor path-walk delta phase
11: fddc320eb0b ! 12: 7ae9a40f346 pack-objects: thread the path-based compression
@@ Commit message
objects being packed. (This was tested on a 16-core machine.)
Test HEAD~1 HEAD
- ---------------------------------------------------------------
- 5313.2: thin pack 0.01 0.01 +0.0%
- 5313.4: thin pack with --path-walk 0.01 0.01 +0.0%
- 5313.6: big pack 2.54 2.60 +2.4%
- 5313.8: big pack with --path-walk 4.70 3.09 -34.3%
- 5313.10: repack 28.75 28.55 -0.7%
- 5313.12: repack with --path-walk 108.55 46.14 -57.5%
+ -----------------------------------------------------------------
+ 5313.2: thin pack 0.00 0.00 =
+ 5313.3: thin pack size 589 589 +0.0%
+ 5313.4: thin pack with --path-walk 0.00 0.00 =
+ 5313.5: thin pack size with --path-walk 589 589 +0.0%
+ 5313.6: big pack 2.84 2.80 -1.4%
+ 5313.7: big pack size 14.0M 14.1M +0.3%
+ 5313.8: big pack with --path-walk 5.46 3.77 -31.0%
+ 5313.9: big pack size with --path-walk 13.2M 13.2M -0.0%
+ 5313.10: repack 22.11 21.50 -2.8%
+ 5313.11: repack size 126.4M 126.2M -0.2%
+ 5313.12: repack with --path-walk 66.89 26.41 -60.5%
+ 5313.13: repack size with --path-walk 109.6M 109.6M +0.0%
- On the microsoft/fluentui repo, where the --path-walk feature has been
- shown to be more effective in space savings, we get these results:
+ This 60% reduction in 'git repack --path-walk' time is typical across
+ all repos I used for testing. What is interesting is to compare when the
+ overall time improves enough to outperform the standard case. These time
+ improvements correlate with repositories with data shapes that
+ significantly improve their data size as well.
- Test HEAD~1 HEAD
+ For example, the microsoft/fluentui repo has a 439M to 122M size
+ reduction, and the repack time is now 36.6 seconds with --path-walk
+ compared to 95+ seconds without it:
+
+ Test HEAD~! HEAD
+ -----------------------------------------------------------------
+ 5313.2: thin pack 0.41 0.42 +2.4%
+ 5313.3: thin pack size 1.2M 1.2M +0.0%
+ 5313.4: thin pack with --path-walk 0.08 0.05 -37.5%
+ 5313.5: thin pack size with --path-walk 18.4K 18.4K +0.0%
+ 5313.6: big pack 4.47 4.53 +1.3%
+ 5313.7: big pack size 19.6M 19.7M +0.3%
+ 5313.8: big pack with --path-walk 6.76 3.51 -48.1%
+ 5313.9: big pack size with --path-walk 16.5M 16.4M -0.2%
+ 5313.10: repack 96.87 99.05 +2.3%
+ 5313.11: repack size 439.5M 439.0M -0.1%
+ 5313.12: repack with --path-walk 95.68 36.55 -61.8%
+ 5313.13: repack size with --path-walk 122.6M 122.6M +0.0%
+
+ In a more extreme example, an internal repository that has a similar
+ name-hash collision issue to microsoft/fluentui reduces its size from
+ 6.4G to 805M with the --path-walk option. This also reduces the
+ repacking time from 2,138 seconds to 478 seconds.
+
+ Test HEAD~1 HEAD
+ ------------------------------------------------------------------
+ 5313.10: repack 2138.22 2138.19 -0.0%
+ 5313.11: repack size 6.4G 6.4G -0.0%
+ 5313.12: repack with --path-walk 1351.46 477.91 -64.6%
+ 5313.13: repack size with --path-walk 804.1M 804.1M -0.0%
+
+ Finally, the Linux kernel repository is a good test for this repacking
+ time change, even though the space savings is more reasonable:
+
+ Test HEAD~1 HEAD
----------------------------------------------------------------
- 5313.2: thin pack 0.39 0.40 +2.6%
- 5313.4: thin pack with --path-walk 0.08 0.07 -12.5%
- 5313.6: big pack 4.15 4.15 +0.0%
- 5313.8: big pack with --path-walk 6.41 3.21 -49.9%
- 5313.10: repack 90.69 90.83 +0.2%
- 5313.12: repack with --path-walk 108.23 49.09 -54.6%
+ 5313.10: repack 734.26 735.11 +0.1%
+ 5313.11: repack size 2.5G 2.5G -0.0%
+ 5313.12: repack with --path-walk 1457.23 598.17 -59.0%
+ 5313.13: repack size with --path-walk 2.2G 2.2G +0.0%
Signed-off-by: Derrick Stolee
```
WIP
Thanks, -Stolee
Here is the range-diff from the equivalent patches in the previous version:
``` 1: 3fdb57edbc5 ! 1: 6365d8148c4 pack-objects: extract should_attempt_deltas() @@ Metadata ## Commit message ## pack-objects: extract should_attempt_deltas() - This will be helpful in a future change that introduces a new way to - compute deltas. - - Be careful to preserve the nr_deltas counting logic in the existing - method, but take the rest of the logic wholesale. + This will be helpful in a future change. Signed-off-by: Derrick Stolee