LeeBergstrand / BackBLAST_Reciprocal_BLAST

This repository contains a reciprocal BLAST program for filtering down BLAST results to best bidirectional hits. It also contains a toolkit for finding and visualizing BLAST hits for gene clusters within multiple bacterial genomes.
MIT License
14 stars 8 forks source link

Large repo size #47

Open jmtsuji opened 5 years ago

jmtsuji commented 5 years ago

The BackBLAST repo is currently a couple hundred MB in size, which is quite large. I suspect this is mostly due to old ExampleData files, which have now been removed in BackBLAST2.

We will need to clean the repo somehow to get the repo size down eventually. We have a couple options:

@LeeBergstrand Thoughts? Best practices?

LeeBergstrand commented 5 years ago

@jmtsuji I was under-reading when I sent you that article. The approach this guy uses is lazy and incorrect. I honestly thought he was doing the below. You can prune files out of your git history locally and then force overwrite your local history onto Github's. That way you don't have to make any new repo's or lose any files from the past.

See here for how to do it: https://help.github.com/en/articles/removing-sensitive-data-from-a-repository

The above is a tutorial for removing senstive files (think those containing AWS keys) from the repo. The concept the same for large files. You prune them from the history.

Key Facts

LeeBergstrand commented 5 years ago

How to find the large files.

https://stackoverflow.com/questions/10622179/how-to-find-identify-large-commits-in-git-history

LeeBergstrand commented 5 years ago

@jmtsuji I would expect the repo to be several tens of megabytes even after the large files are removed. Squashing commits may make history smaller as well.

https://git-scm.com/book/en/v2/Git-Tools-Rewriting-History

jmtsuji commented 5 years ago

@LeeBergstrand Thanks for the tips. I'll try downsizing the repo when I have some time. Will warn you beforehand so that neither of us are developing the code during the pruning process. It might take me a couple months to get to this -- will leave this issue open for now,