apache / arrow-rs

Official Rust implementation of Apache Arrow
https://arrow.apache.org/
Apache License 2.0
2.31k stars 677 forks source link

The arrow-rs repo is very large #5908

Open alamb opened 1 week ago

alamb commented 1 week ago

Describe the bug Whenever I do git pull apache to pull arrow-rs it requires over 1GB

To Reproduce

andrewlamb@Andrews-MacBook-Pro-2:/tmp$ git clone git@github.com:apache/arrow-rs.git
Cloning into 'arrow-rs'...
remote: Enumerating objects: 1317790, done.
remote: Counting objects: 100% (140124/140124), done.
remote: Compressing objects: 100% (16954/16954), done.
remote: Total 1317790 (delta 127408), reused 135019 (delta 122748), pack-reused 1177666
Receiving objects: 100% (1317790/1317790), 1.02 GiB | 33.01 MiB/s, done.
Resolving deltas: 100% (1172574/1172574), done.

Receiving objects: 100% (1317790/1317790), 1.02 GiB | 33.01 MiB/s, done.

!!!!

Expected behavior It has only source code and should be much smaller

Additional context I strongly believe this is related to the https://github.com/apache/arrow-rs/actions/runs/9552252515 that pushes a preview version of the docs to https://arrow.apache.org/rust/

alamb commented 1 week ago

I think I can just remove the history of the asf-branch and avoid all this hisotry, I will try so

alamb commented 1 week ago

Here is what I did to fix it now:

git fetch apache
# make a new root commit
git checkout --orphan new-asf-site apache/asf-site
# commit in the current copy
git commit -m "Initial asf-site commit"
# make a new branch
git checkout -b asf-site
# force push it to apache 
git push -f apache

My reading of doing this is that each commit to arrow-rs that makes documentation results in 7MB of docs getting pushed to the asf-branch 🤯

andrewlamb@Andrews-MacBook-Pro-2:~/Software/arrow-rs$ git push -f apache
Enumerating objects: 5871, done.
Counting objects: 100% (5871/5871), done.
Delta compression using up to 16 threads
Compressing objects: 100% (3675/3675), done.
Writing objects: 100% (5871/5871), 7.83 MiB | 2.75 MiB/s, done.
Total 5871 (delta 4562), reused 3103 (delta 2065), pack-reused 0 (from 0)
remote: Resolving deltas: 100% (4562/4562), done.
To github.com:apache/arrow-rs.git
 + 47a8dd03b8b...b6a61fb3a76 asf-site -> asf-site (forced update)
branch 'asf-site' set up to track 'apache/asf-site'.
alamb commented 1 week ago

My temporary workaround seems to have improved things:

Before that change

andrewlamb@Andrews-MacBook-Pro-2:/tmp$ du -s -h arrow-rs/
1.1G    arrow-rs/

After the change:

andrewlamb@Andrews-MacBook-Pro-2:/tmp$ du -s -h arrow-rs/
 47M    arrow-rs/

Maybe we should fix up the CI job to avoid saving any history of the old docs 🤔