ghuser-io / ghuser.io

:octocat: Better GitHub profiles
https://ghuser.io
MIT License
809 stars 47 forks source link

Big repository / tree #146

Closed DanielRuf closed 5 years ago

DanielRuf commented 5 years ago

Currently this project is around 34MB big.

lourot commented 5 years ago

Indeed it's becoming urgent that I do something about it :)

DanielRuf commented 5 years ago

I'll do a short analysis. In general we can not rewrite the history (just on / with a new clean branch).

DanielRuf commented 5 years ago

Hm, not sure if this can be also done with Enhanced GitHub and Refined GitHub. It seems we are cloning GitHub here.

lourot commented 5 years ago

It's because the DB is a set of json files in db/, which was OK as long as we had only a few users. We need to move that to a real DB or at least to a separate repo

DanielRuf commented 5 years ago
bfg --strip-blobs-bigger-than 1M ghuser.io.git                

Using repo : /Users/druf/projects/ghuser.io.git

Scanning packfile for large blobs: 77756
Scanning packfile for large blobs completed in 593 ms.
Found 77 blob ids for large blobs - biggest=23358336 smallest=1967084
Total size (unpacked)=396294614
Found 4365 objects to protect
Found 7 commit-pointing refs : HEAD, refs/heads/dev, refs/heads/master, ...

Protected commits
-----------------

These are your protected commits, and so their contents will NOT be altered:

 * commit 703f7383 (protected by 'HEAD') - contains 1 dirty file : 
    - reframe/up (22,3 MB)

WARNING: The dirty content above may be removed from other commits, but as
the *protected* commits still use it, it will STILL exist in your repository.

Details of protected dirty content have been recorded here :

/Users/druf/projects/ghuser.io.git.bfg-report/2018-09-11/14-18-36/protected-dirt/

If you *really* want this content gone, make a manual commit that removes it,
and then run the BFG on a fresh copy of your repo.

Cleaning
--------

Found 567 commits
Cleaning commits:       100% (567/567)
Cleaning commits completed in 1.350 ms.

Updating 6 Refs
---------------

    Ref                  Before     After   
    ----------------------------------------
    refs/heads/dev     | a494d87a | 7d8fa1cc
    refs/heads/master  | 703f7383 | 3ca81569
    refs/pull/123/head | 22b4e8d0 | 8b89aaa4
    refs/pull/32/head  | e0730028 | 2549a32e
    refs/pull/35/head  | 3ce36e7f | c2cb32a3
    refs/pull/38/head  | 29f62df8 | a62c3e00

Updating references:    100% (6/6)
...Ref update completed in 27 ms.

Commit Tree-Dirt History
------------------------

    Earliest                                              Latest
    |                                                          |
    DDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDD

    D = dirty commits (file tree fixed)
    m = modified commits (commit message or parents changed)
    . = clean commits (no changes to file tree)

                            Before     After   
    -------------------------------------------
    First modified commit | 4aff9799 | 983bb3c7
    Last dirty commit     | a494d87a | 7d8fa1cc

Deleted files
-------------

    Filename     Git id                                   
    ------------------------------------------------------
    db.json    | 5ec094cb (5,5 MB), 122ecc74 (3,4 MB), ...
    repos.json | 4e9300ad (5,1 MB), bedfbd6c (4,9 MB), ...
    up         | c9abaf59 (22,3 MB)                       

In total, 1482 object ids were changed. Full details are logged here:

    /Users/druf/projects/ghuser.io.git.bfg-report/2018-09-11/14-18-36

BFG run is complete! When ready, run: git reflog expire --expire=now --all && git gc --prune=now --aggressive

need to move that to a real DB or at least to a separate repo

Better use a real (relational) database as it can split data into different files depending on the InnoDB / MyISAM settings ;-)

Or some other GraphQL / streaming solution / search solution similar to lunr / elasticlunr (at least needs a server).

DanielRuf commented 5 years ago

Not sure if https://github.com/sindresorhus is in there but this would be more data (same applies to me and others too) ;-)

lourot commented 5 years ago

50M -> 16M

DB is now here. Still json files but that will also be improved eventually. Thanks for this issue!