Closed hxin closed 5 years ago
The BFG tool is also mentioned by github help page
I fork the repo and will use the forked repo for testing.
5f7b3394e88a 1.1MiB tests/data/pe/chipseq/filtered_reads/Blocks/chiseq_mouse_sample___mouse___BLOCK___2.bam
7664ffc2f03e 1.1MiB tests/data/pe/chipseq/filtered_reads/Blocks/chiseq_mouse_sample___mouse___BLOCK___1.bam
87c5387ae7ed 1.2MiB tests/data/pe/rnaseq/filtered_reads/rnaseq_mouse_rat_sample_mouse_filtered.bam
8b09117b550d 1.2MiB tests/data/se/chipseq/filtered_reads/Blocks/chiseq_mouse_se_sample___mouse___BLOCK___1.bam
7e003799c8cf 1.3MiB tests/data/se/chipseq/filtered_reads/Blocks/chiseq_mouse_se_sample___mouse___BLOCK___2.bam
df98e9adfd83 1.5MiB pipeline_test/data/bam/sample_reads.human.bam
4f91b3c1941c 1.5MiB tests/data/pe/rnaseq/sorted_reads/rnaseq_mouse_rat_sample.human.bam
c054e2ea2ef1 2.8MiB pipeline_test/data/fastq/mouse_rat_test_1.fastq.gz
68f1a5b71a92 2.9MiB pipeline_test/data/fastq/mouse_rat_test_2.fastq.gz
982b4b5e4225 3.1MiB tests/data/pe/rnaseq/filtered_reads/Blocks/rnaseq_mouse_rat_sample___human___BLOCK___1.bam
15f50b0528c4 3.2MiB tests/data/pe/rnaseq/filtered_reads/Blocks/rnaseq_mouse_rat_sample___human___BLOCK___2.bam
91c15a0935a5 3.6MiB tests/data/se/bisulfite/filtered_reads/bisulfite_human_se_sample___human___1___filtered.bam
ef8e9ebc795b 3.6MiB tests/data/pe/chipseq/filtered_reads/Blocks/chiseq_mouse_sample___mouse___BLOCK___1.bam
3b6979566ab1 3.6MiB tests/data/se/bisulfite/filtered_reads/bisulfite_human_se_sample___human___0___filtered.bam
733ddf4693c4 3.6MiB tests/data/pe/chipseq/filtered_reads/Blocks/chiseq_mouse_sample___mouse___BLOCK___2.bam
794ce6ce7bea 4.0MiB tests/data/raw_reads/bisulfite_human_pe_R1.fastq.gz
81678411ada0 4.0MiB tests/data/pe/rnaseq/filtered_reads/rnaseq_mouse_rat_sample_rat_1_filtered.bam
635d08091a8b 4.1MiB tests/data/pe/rnaseq/filtered_reads/rnaseq_mouse_rat_sample_rat_0_filtered.bam
3cd1081155f2 4.7MiB tests/data/pe/bisulfite/filtered_reads/bisulfite_human_pe_sample___human___0___filtered.bam
7e81c6c56a69 4.8MiB tests/data/pe/bisulfite/filtered_reads/bisulfite_human_pe_sample___human___1___filtered.bam
568f5c0a984f 5.1MiB tests/data/raw_reads/bisulfite_human_pe_R2.fastq.gz
b0da1428a772 5.5MiB pipeline_test/data/bam/sample_reads.mouse.bam
a367d7940712 5.5MiB tests/data/pe/rnaseq/sorted_reads/rnaseq_mouse_rat_sample.mouse.bam
1b1ed02c163a 7.2MiB tests/data/se/bisulfite/filtered_reads/bisulfite_human_se_sample___human___filtered.bam
811066e69490 7.6MiB tests/data/raw_reads/bisulfite_human_se.fastq.gz
30220ba773cd 8.1MiB tests/data/pe/rnaseq/filtered_reads/rnaseq_mouse_rat_sample_rat_filtered.bam
a5adcea049ff 8.5MiB tests/data/se/bisulfite/sorted_reads/bisulfite_human_se_sample.human.premerge.bam
d391f5a7b87c 8.5MiB tests/data/se/bisulfite/mapped_reads/bisulfite_human_se_sample.human.bam
1a44ecfed55c 9.4MiB pipeline_test/data/bam/sample_reads.rat.bam
b02776cc3020 9.4MiB tests/data/pe/rnaseq/sorted_reads/rnaseq_mouse_rat_sample.rat.bam
c7897e505f78 9.5MiB tests/data/pe/bisulfite/filtered_reads/bisulfite_human_pe_sample___human___filtered.bam
1766a7fa2e51 10MiB tests/data/pe/rnaseq/filtered_reads/Blocks/rnaseq_mouse_rat_sample___mouse___BLOCK___2.bam
63b3437938ba 10MiB tests/data/pe/rnaseq/filtered_reads/Blocks/rnaseq_mouse_rat_sample___mouse___BLOCK___1.bam
a89af79006d4 10MiB tests/data/se/bisulfite/sorted_reads/bisulfite_human_se_sample.human.bam
69b886750287 15MiB tests/data/pe/bisulfite/mapped_reads/bisulfite_human_pe_sample.human.bam
eb13efb043aa 15MiB tests/data/pe/bisulfite/sorted_reads/bisulfite_human_pe_sample.human.premerge.bam
53d80afe6a70 16MiB tests/data/se/bisulfite/filtered_reads/Blocks/bisulfite_human_se_sample___human___BLOCK___2.bam
021565acacff 16MiB tests/data/se/bisulfite/filtered_reads/Blocks/bisulfite_human_se_sample___human___BLOCK___1.bam
21563153931d 18MiB tests/data/pe/rnaseq/filtered_reads/Blocks/rnaseq_mouse_rat_sample___rat___BLOCK___2.bam
40c465fe3383 18MiB tests/data/pe/rnaseq/filtered_reads/Blocks/rnaseq_mouse_rat_sample___rat___BLOCK___1.bam
b46a9c343f87 18MiB tests/data/pe/bisulfite/sorted_reads/bisulfite_human_pe_sample.human.bam
101895be216f 41MiB tests/data/pe/bisulfite/filtered_reads/Blocks/bisulfite_human_pe_sample___human___BLOCK___1.bam
53abc0a7b419 41MiB tests/data/pe/bisulfite/filtered_reads/Blocks/bisulfite_human_pe_sample___human___BLOCK___2.bam
Here is the plan:
git clone --mirror https://github.com/statbio/Sargasso.git &
#git clone https://github.com/statbio/Sargasso.git &
wget https://repo1.maven.org/maven2/com/madgag/bfg/1.13.0/bfg-1.13.0.jar &
cd Sargasso.git
git rev-list --objects --all \
| git cat-file --batch-check='%(objecttype) %(objectname) %(objectsize) %(rest)' \
| sed -n 's/^blob //p' \
| sort --numeric-sort --key=2 \
| cut -c 1-12,41- \
| $(command -v gnumfmt || echo numfmt) --field=2 --to=iec-i --suffix=B --padding=7 --round=nearest
cd ..
# java -jar bfg-1.13.0.jar --strip-blobs-bigger-than 20M ~/tmp/Sargasso.git
java -jar bfg-1.13.0.jar --delete-files bisulfite_human_se_sample___human___1___filtered.bam ~/tmp/Sargasso.git
java -jar bfg-1.13.0.jar --delete-files bisulfite_human_se_sample___human___0___filtered.bam ~/tmp/Sargasso.git
java -jar bfg-1.13.0.jar --delete-files bisulfite_human_pe_R1.fastq.gz ~/tmp/Sargasso.git
java -jar bfg-1.13.0.jar --delete-files bisulfite_human_pe_sample___human___0___filtered.bam ~/tmp/Sargasso.git
java -jar bfg-1.13.0.jar --delete-files bisulfite_human_pe_sample___human___1___filtered.bam ~/tmp/Sargasso.git
java -jar bfg-1.13.0.jar --delete-files bisulfite_human_pe_R2.fastq.gz ~/tmp/Sargasso.git
java -jar bfg-1.13.0.jar --delete-files bisulfite_human_se_sample___human___filtered.bam ~/tmp/Sargasso.git
java -jar bfg-1.13.0.jar --delete-files bisulfite_human_se.fastq.gz ~/tmp/Sargasso.git
java -jar bfg-1.13.0.jar --delete-files bisulfite_human_se_sample.human.premerge.bam ~/tmp/Sargasso.git
java -jar bfg-1.13.0.jar --delete-files bisulfite_human_se_sample.human.bam ~/tmp/Sargasso.git
java -jar bfg-1.13.0.jar --delete-files bisulfite_human_pe_sample___human___filtered.bam ~/tmp/Sargasso.git
java -jar bfg-1.13.0.jar --delete-files bisulfite_human_se_sample.human.bam ~/tmp/Sargasso.git
java -jar bfg-1.13.0.jar --delete-files bisulfite_human_pe_sample.human.bam ~/tmp/Sargasso.git
java -jar bfg-1.13.0.jar --delete-files bisulfite_human_pe_sample.human.premerge.bam ~/tmp/Sargasso.git
java -jar bfg-1.13.0.jar --delete-files bisulfite_human_se_sample___human___BLOCK___2.bam ~/tmp/Sargasso.git
java -jar bfg-1.13.0.jar --delete-files bisulfite_human_se_sample___human___BLOCK___1.bam ~/tmp/Sargasso.git
java -jar bfg-1.13.0.jar --delete-files bisulfite_human_pe_sample.human.bam ~/tmp/Sargasso.git
java -jar bfg-1.13.0.jar --delete-files bisulfite_human_pe_sample___human___BLOCK___1.bam ~/tmp/Sargasso.git
java -jar bfg-1.13.0.jar --delete-files bisulfite_human_pe_sample___human___BLOCK___2.bam ~/tmp/Sargasso.git
cd Sargasso.git
git reflog expire --expire=now --all && git gc --prune=now --aggressive
git push
This will remove the large files and change the commit hash, just in the bisulfite branch. A git pull is required afterwards to update other local Sargasso repo.
This reduce the size of the repo from 230mb to 77mb.
plan to follow this
Use this to remove large file. Anything messes up with git history is risky... I am not sure how this work atm so will need to check a few more things before acturally push the change back. Will also confirm with @lweasel before making any change to the repo.