Closed JDRomano2 closed 4 years ago
Used bfg-repo-cleaner to remove all blobs containing .gz and .html files from the history (the most recent commit is untouched).
For example, no .tsv.gz
source file is present in the following directory: https://github.com/EpistasisLab/penn-ml-benchmarks/tree/51207e96ce3ccb047908fd0d2532344d77573fc6/datasets/1027_ESL
All users should re-clone the repository to avoid adding 'dirty' files back in when new features are merged into master
. For the short future, new pull-requests should be inspected to make sure old database or profiling reports haven't been reintroduced (however, this should be fairly obvious).
30 addressed lack of Git LFS for the large dataset files. It makes sense to remove these from the commit history, as well. The main affect is reducing the size of the repository when cloned, but it also has other beneficial side effects such as making the commit history easier to browse and navigate.
Aside from removing large dataset files from the history, is there anything else we can/should clean up?