ersilia-os / zaira-chem

Automated QSAR based on multiple small molecule descriptors
GNU General Public License v3.0
27 stars 10 forks source link

Remove Git LFS files #40

Closed GemmaTuron closed 3 months ago

GemmaTuron commented 5 months ago

We have a lot of legacy files tracked in git-lfs, which creates a history of almost 2GB We can clean this up by changing all the commit history (otherwise the pointers are not deleted actually)

Steps:

git lfs ls-files #this will list current git LFS files
git lfs migrate export --include-ref=main --include="requirements.txt" #this will remove the requirements.txt from git lfs
git push origin main -f

We can remove from the tracking most git lfs files in the same fashion, saving us a lot of space and also users when clonning the repo. These are the current git lfs files:

feee072247 * zairachem/data/atom_pols.txt
4c79cd109b * zairachem/tools/fpsim2/FPSim2/docs/_sources/index.rst.txt
715b45b3e0 * zairachem/tools/fpsim2/FPSim2/docs/_sources/source/FPSim2.io.backends.rst.txt
7ca37e817c * zairachem/tools/fpsim2/FPSim2/docs/_sources/source/FPSim2.io.rst.txt
da1e5a9c83 * zairachem/tools/fpsim2/FPSim2/docs/_sources/source/FPSim2.rst.txt
33f87ccb8e * zairachem/tools/fpsim2/FPSim2/docs/_sources/source/modules.rst.txt
4e300b8af9 * zairachem/tools/fpsim2/FPSim2/docs/_sources/source/user_guide/create_fp_db.rst.txt
a32031e60f * zairachem/tools/fpsim2/FPSim2/docs/_sources/source/user_guide/gpu_sim.rst.txt
2ac4ca5bcb * zairachem/tools/fpsim2/FPSim2/docs/_sources/source/user_guide/install.rst.txt
5c152d2d9c * zairachem/tools/fpsim2/FPSim2/docs/_sources/source/user_guide/limitations.rst.txt
77545a4d9b * zairachem/tools/fpsim2/FPSim2/docs/_sources/source/user_guide/sim.rst.txt
b0a14ded29 * zairachem/tools/fpsim2/FPSim2/docs/_sources/source/user_guide/sim_matrix.rst.txt
d18b20bd9d * zairachem/tools/fpsim2/FPSim2/docs/_sources/source/user_guide/subs.rst.txt
929d824e27 * zairachem/tools/fpsim2/FPSim2/docs/_sources/source/user_guide/tversky.rst.txt
0dd174b621 * zairachem/tools/fpsim2/FPSim2/tests/data/test.h5
527d498610 * zairachem/tools/fpsim2/data/reference_library.csv
92ca86fed2 * zairachem/tools/fpsim2/data/reference_library.h5
1571cfaac2 * zairachem/tools/ghost/ghostml/test_data/chembl3371_testing_data.pkl
5feeae0837 * zairachem/tools/mollib/virtual_libraries/data/MEGx_Release_180901_All_4557_length_filtered_wo_sugar.txt
1abc0e3b08 * zairachem/tools/mollib/virtual_libraries/data/chembl24_cleaned_unique_canon.txt
3c9e79bb69 * zairachem/tools/mollib/virtual_libraries/data/my_molecules.txt
5feeae0837 * zairachem/tools/mollib/virtual_libraries/experiments/results/data/MEGx_Release_180901_All_4557_length_filtered_wo_sugar/1_140_x0/1_140_x0.txt
801a3aafb8 * zairachem/tools/mollib/virtual_libraries/experiments/results/data/MEGx_Release_180901_All_4557_length_filtered_wo_sugar/1_140_x0/data_tr.txt
835b3235f5 * zairachem/tools/mollib/virtual_libraries/experiments/results/data/MEGx_Release_180901_All_4557_length_filtered_wo_sugar/1_140_x0/data_val.txt
97fbc7d936 * zairachem/tools/mollib/virtual_libraries/experiments/results/data/MEGx_Release_180901_All_4557_length_filtered_wo_sugar/1_140_x0/desc.pkl
14d2f3e81d * zairachem/tools/mollib/virtual_libraries/experiments/results/data/MEGx_Release_180901_All_4557_length_filtered_wo_sugar/1_140_x0/fp.pkl
1d5beca471 * zairachem/tools/mollib/virtual_libraries/experiments/results/data/MEGx_Release_180901_All_4557_length_filtered_wo_sugar/1_140_x0/generic_scaffolds.txt
4d8400f42a * zairachem/tools/mollib/virtual_libraries/experiments/results/data/MEGx_Release_180901_All_4557_length_filtered_wo_sugar/1_140_x0/idx_tr.pkl
bfd663a77f * zairachem/tools/mollib/virtual_libraries/experiments/results/data/MEGx_Release_180901_All_4557_length_filtered_wo_sugar/1_140_x0/idx_val.pkl
53c4615f48 * zairachem/tools/mollib/virtual_libraries/experiments/results/data/MEGx_Release_180901_All_4557_length_filtered_wo_sugar/1_140_x0/scaf.pkl
27fdbc2468 * zairachem/tools/mollib/virtual_libraries/experiments/results/data/MEGx_Release_180901_All_4557_length_filtered_wo_sugar/1_140_x0/scaffolds.txt
ec0f78f70d * zairachem/tools/mollib/virtual_libraries/experiments/results/data/chembl24_cleaned_unique_canon/1_140_x10/1_140_x10.txt
8f55532089 * zairachem/tools/mollib/virtual_libraries/experiments/results/data/chembl24_cleaned_unique_canon/1_140_x10/data_tr.txt
fb1816171d * zairachem/tools/mollib/virtual_libraries/experiments/results/data/chembl24_cleaned_unique_canon/1_140_x10/data_val.txt
6101313bf4 * zairachem/tools/mollib/virtual_libraries/experiments/results/data/chembl24_cleaned_unique_canon/1_140_x10/desc.pkl
a453a0541e * zairachem/tools/mollib/virtual_libraries/experiments/results/data/chembl24_cleaned_unique_canon/1_140_x10/fp.pkl
af2e34e529 * zairachem/tools/mollib/virtual_libraries/experiments/results/data/chembl24_cleaned_unique_canon/1_140_x10/generic_scaffolds.txt
588d6b7f14 * zairachem/tools/mollib/virtual_libraries/experiments/results/data/chembl24_cleaned_unique_canon/1_140_x10/idx_tr.pkl
329fc06062 * zairachem/tools/mollib/virtual_libraries/experiments/results/data/chembl24_cleaned_unique_canon/1_140_x10/idx_val.pkl
3bdbcd6fbb * zairachem/tools/mollib/virtual_libraries/experiments/results/data/chembl24_cleaned_unique_canon/1_140_x10/scaf.pkl
157c174924 * zairachem/tools/mollib/virtual_libraries/experiments/results/data/chembl24_cleaned_unique_canon/1_140_x10/scaffolds.txt
866154deb5 * zairachem/tools/mollib/virtual_libraries/experiments/results/data/my_molecules/1_140_x10/1_140_x10.txt
3c9e79bb69 * zairachem/tools/mollib/virtual_libraries/experiments/results/data/my_molecules/1_140_x10/data_tr.txt
3c9e79bb69 * zairachem/tools/mollib/virtual_libraries/experiments/results/data/my_molecules/1_140_x10/data_val.txt
b3cbfab822 * zairachem/tools/mollib/virtual_libraries/experiments/results/data/my_molecules/1_140_x10/desc.pkl
ad3a6af4cb * zairachem/tools/mollib/virtual_libraries/experiments/results/data/my_molecules/1_140_x10/fp.pkl
8f6ed5c635 * zairachem/tools/mollib/virtual_libraries/experiments/results/data/my_molecules/1_140_x10/generic_scaffolds.txt
f4909f12da * zairachem/tools/mollib/virtual_libraries/experiments/results/data/my_molecules/1_140_x10/idx_tr.pkl
7225963e04 * zairachem/tools/mollib/virtual_libraries/experiments/results/data/my_molecules/1_140_x10/idx_val.pkl
3bb58bee46 * zairachem/tools/mollib/virtual_libraries/experiments/results/data/my_molecules/1_140_x10/scaf.pkl
3f1fc605b1 * zairachem/tools/mollib/virtual_libraries/experiments/results/data/my_molecules/1_140_x10/scaffolds.txt
1f9074e73e * zairachem/tools/mollib/virtual_libraries/experiments/results/my_molecules/models/02.h5
171d3a40c0 * zairachem/tools/mollib/virtual_libraries/experiments/results/my_molecules/models/04.h5
c3392cb261 * zairachem/tools/mollib/virtual_libraries/experiments/results/my_molecules/models/06.h5
051838d651 * zairachem/tools/mollib/virtual_libraries/experiments/results/my_molecules/models/08.h5
8d721c6f1c * zairachem/tools/mollib/virtual_libraries/experiments/results/my_molecules/models/10.h5
68d1e33663 * zairachem/tools/mollib/virtual_libraries/experiments/results/my_molecules/models/history.pkl
de7a6e875d * zairachem/tools/mollib/virtual_libraries/models/c24_augmentationx10_minlen1_maxlen140.h5
2aacc3e996 * zairachem/tools/mollib/virtual_libraries/models/molecules_start_0.7.txt
5eafefebe7 * zairachem/tools/mollib/virtual_libraries/src/python/fcd/model_FCD_all.h5
GemmaTuron commented 3 months ago

@miquelduranfrigola and @DhanshreeA I'd like to do this to remove the heavy parts of ZairaChem which are not needed - but I'd need your confirmation if you are ok with it

DhanshreeA commented 3 months ago

If the files are truly legacy and not needed by any tool (ie fpsim, mollib as I can see here), then I don't see why we can't remove these.

GemmaTuron commented 3 months ago

OK, these files need to be removed one by one and the entire Git History rewritten. We are in a similar situation as we encountered with Chem Sampler - this does not make much sense to do one by one given that we want to refactor all of the code - the problem is that there is so much history and legacy files it will always take forever to clone unless we delete all the history @miquelduranfrigola what do you suggest?

fyi @dhanshreea, deleting the files does not eliminate them from the git lfs registry, that is what is so annoying. pointers remain there forever unless you rewrite the entire commit history (>200 commits, it takes forever)

GemmaTuron commented 3 months ago

ok, I have deleted Mollib entirely and cleaned up FPSIM2 of files that were not needed (incl the reference libraries)

We can close this issue for the moment