Git-based files-to-artifacts database deployment prototype

jaimergp commented 3 months ago

The idea is to use git object database to manipulate the repo instead of doing it via the filesystem. The prototypes:

Tooling: https://github.com/zklaus/cfgraphman
https://github.com/zklaus/cfgraph (subdir at the end of the path)
https://github.com/zklaus/cfgraph-by-subdir (subdir at the beginning)

zklaus commented 3 months ago

Processing blobs: 842724                        
Processing trees: 4674405                        
Processing commits: 1                        
Matching commits to trees: 1                        
Processing annotated tags: 0                        
Processing references: 2                        
| Name                         | Value     | Level of concern               |
| ---------------------------- | --------- | ------------------------------ |
| Overall repository size      |           |                                |
| * Commits                    |           |                                |
|   * Count                    |     1     |                                |
|   * Total size               |   216 B   |                                |
| * Trees                      |           |                                |
|   * Count                    |  4.67 M   | ***                            |
|   * Total size               |  1.48 GiB |                                |
|   * Total tree entries       |  37.2 M   |                                |
| * Blobs                      |           |                                |
|   * Count                    |   843 k   |                                |
|   * Total size               |   414 MiB |                                |
| * Annotated tags             |           |                                |
|   * Count                    |     0     |                                |
| * References                 |           |                                |
|   * Count                    |     2     |                                |
|     * Branches               |     1     |                                |
|     * Remote-tracking refs   |     1     |                                |
|                              |           |                                |
| Biggest objects              |           |                                |
| * Commits                    |           |                                |
|   * Maximum size         [1] |   216 B   |                                |
|   * Maximum parents      [1] |     0     |                                |
| * Trees                      |           |                                |
|   * Maximum entries      [2] |   120 k   | !!!!!!!!!!!!!!!!!!!!!!!!!!!!!! |
| * Blobs                      |           |                                |
|   * Maximum size         [3] |   204 KiB |                                |
|                              |           |                                |
| History structure            |           |                                |
| * Maximum history depth      |     1     |                                |
| * Maximum tag depth          |     0     |                                |
|                              |           |                                |
| Biggest checkouts            |           |                                |
| * Number of directories  [4] |  42.4 M   | !!!!!!!!!!!!!!!!!!!!!!!!!!!!!! |
| * Maximum path depth     [4] |    39     | ***                            |
| * Maximum path length    [4] |   449 B   | ****                           |
| * Number of files        [4] |  37.0 M   | !!!!!!!!!!!!!!!!!!!!!!!!!!!!!! |
| * Total size of files    [4] |  17.8 GiB | *******************            |
| * Number of symlinks         |     0     |                                |
| * Number of submodules       |     0     |                                |

[1]  5ccd42e5c465418405284631d1ab3ee42062abdd (refs/heads/main)
[2]  6c3f9e68b3ed1b7e0a9dba13c10a606d3af3c8bb (refs/heads/main:noarch/site-packages)
[3]  06bbd9078a55c19b846d9131e01880909aa376c3 (refs/heads/main:linux-64/bin/ex/conda-forge-artifacts.txt)
[4]  88c00c398469e8f45d1d4785ad0098009bda721a (refs/heads/main^{tree})

zklaus commented 2 weeks ago

As suspected, I was contacted these days by the github support team, asking me to remove the data repository. While it is quite feasible from a pure git perspective, the structure places too high demands on the github infrastructure in terms of indexing etc.

Quansight-Labs / czi-conda-forge-mgmt

Git-based files-to-artifacts database deployment prototype #58