Closed tlambert03 closed 1 year ago
cc @toloudis cc @AetherUnbound
Yes, at one point we kept test images in GitLFS and I think even git accidentally. I would be very happy to remove some old stuff if possible. ~I have no idea how to do so however.~ I see from your link that it goes over how to do it. I may be able to take some time next week to remove.
My quarter wraps up next week so it may actually be possible!
just ran this command:
git rev-list --objects --all |
git cat-file --batch-check='%(objecttype) %(objectname) %(objectsize) %(rest)' |
sed -n 's/^blob //p' |
sort --numeric-sort --key=2 |
cut -c 1-12,41- |
$(command -v gnumfmt || echo numfmt) --field=2 --to=iec-i --suffix=B --padding=7 --round=nearest
and got this tail
...
0bd1f1b1d34e 977KiB aicsimageio/tests/resources/s_1_t_1_c_1_z_1.czi
55dd9ddbf891 1.1MiB aicsimageio/tests/resources/s_1_t_1_c_1_z_1.tiff
5db33ed9f863 1.2MiB aicsimageio/tests/resources/s_1_t_1_c_1_z_1.ome.tiff
81071a9ef6a5 1.5MiB _modules/tifffile/tifffile.html
bbf50d5a9155 1.5MiB _modules/tifffile/tifffile.html
31f341ca5dd4 2.4MiB oldaicsimageio/tests/img/segmentation/input_1_cellWholeIndex.tiff
bc99ea4dc67a 2.6MiB presentations/2021-dask-life-sciences/presentation.ipynb
adae2ca03429 2.9MiB aicsimageio/tests/resources/example.gif
63c5e554bbd8 9.2MiB aicsimageio/tests/resources/s_3_t_1_c_3_z_5.ome.tiff
c81fe5c73f86 9.7MiB aicsimageio/tests/resources/s_1_t_10_c_3_z_1.tiff
278ab933e0a0 14MiB aicsimageio/tests/resources/s_3_t_1_c_3_z_5.czi
f7a36c40df49 15MiB aicsimageio/tests/resources/s_1_t_1_c_10_z_1.ome.tiff
e4a5c77eb02c 27MiB aicsimageio/tests/resources/variable_per_scene_dims.czi
851a737f57ae 93MiB oldaicsimageio/tests/img/segmentation/input_3_nuc_orig_img.tiff
so it might be as easy as
bfg --delete-files "{*.tiff,*.czi}"
(that syntax found here)
What are the odds that someone will need to build and test an old version with those assets ?(famous last words) I approve.
Works for me! 😄 BFG seems like the perfect tool here 💯
Okay that is approval from @toloudis and @AetherUnbound. I am running the BFG.
Someone send help:
~/active/cell/aicsimageio on main [?] env base Python v3.7.12 gcloud evamaxfieldbrown@gmail.com
❯ bfg --delete-files "{*.tiff,*.czi}"
Using repo : /home/eva/active/cell/aicsimageio/.git
Found 101 objects to protect
Found 34 commit-pointing refs : HEAD, refs/heads/admin/include-fsspec-dep-for-czi-in-readme, refs/heads/main, ...
Found 42 tag-pointing refs : refs/tags/v3.2.2, refs/tags/v3.2.3, refs/tags/v3.3.0, ...
Protected commits
-----------------
These are your protected commits, and so their contents will NOT be altered:
* commit 25d561ef (protected by 'HEAD')
Cleaning
--------
Found 979 commits
Cleaning commits: 100% (979/979)
Cleaning commits completed in 525 ms.
Updating 74 Refs
----------------
Ref Before After
------------------------------------------------------------------------------------
refs/heads/admin/include-fsspec-dep-for-czi-in-readme | 97fc79fa | a6625af5
refs/heads/main | 25d561ef | 797b7ea6
refs/remotes/origin/admin/include-fsspec-dep-for-czi-in-readme | 97fc79fa | a6625af5
refs/remotes/origin/admin/support-py311 | f8b01551 | f975a316
refs/remotes/origin/benchmark-results | 3f8898dc | 1f07d079
refs/remotes/origin/feature/ome-metadata-with-save | c079764e | bccd3e3a
refs/remotes/origin/feature/v5-proto | d519acd5 | 7144b252
refs/remotes/origin/feature/zarrwriter | ff7f8c78 | e6cff2bd
refs/remotes/origin/fix/imageio-2.22 | 0ea02b29 | e279edcb
refs/remotes/origin/fix/tiff_handle_dim_i | c712cdac | 977c1a63
refs/remotes/origin/gh-pages | 58e853f7 | e68e37af
refs/remotes/origin/main | 25d561ef | 797b7ea6
refs/remotes/origin/oldaicsimageio | f2e52829 | 419d2465
refs/remotes/origin/v3 | e8349900 | 693860ec
refs/tags/v3.0.0 | 042a55f6 | 1687c21d
...
Updating references: 100% (74/74)
...Ref update completed in 37 ms.
Commit Tree-Dirt History
------------------------
Earliest Latest
| |
DDDDDDDDDDDDDDDDDDDDmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmm
D = dirty commits (file tree fixed)
m = modified commits (commit message or parents changed)
. = clean commits (no changes to file tree)
Before After
-------------------------------------------
First modified commit | ba9f557b | c87264d1
Last dirty commit | 2d089a9e | a217cdb2
Deleted files
-------------
Filename Git id
------------------------------------------------------------------------------
T=5_Z=3_CH=2_CZT_All_CH_per_Slice.czi | 2cdc58af (133 B )
input_1_cellWholeIndex.tiff | 31f341ca (2.4 MB)
input_3_nuc_orig_img.tiff | 851a737f (92.9 MB)
s_1_t_10_c_3_z_1.tiff | c81fe5c7 (9.7 MB)
s_1_t_1_c_10_z_1.ome.tiff | f7a36c40 (15.1 MB)
s_1_t_1_c_1_z_1.czi | 132c641d (132 B ), 0bd1f1b1 (977.3 KB)
s_1_t_1_c_1_z_1.ome.tiff | 5db33ed9 (1.2 MB)
s_1_t_1_c_1_z_1.tiff | 55dd9ddb (1.1 MB)
s_3_t_1_c_3_z_5.czi | 278ab933 (14.0 MB), 89fbdcdd (133 B )
s_3_t_1_c_3_z_5.ome.tiff | 63c5e554 (9.2 MB)
test_5_dimension.czi | 42ca65a9 (132 B )
variable_per_scene_dims.czi | e4a5c77e (26.7 MB)
In total, 1642 object ids were changed. Full details are logged here:
/home/eva/active/cell/aicsimageio.bfg-report/2022-12-09/11-50-17
BFG run is complete! When ready, run: git reflog expire --expire=now --all && git gc --prune=now --aggressive
~/active/cell/aicsimageio on main [?] env base Python v3.7.12 gcloud evamaxfieldbrown@gmail.com
❯ git reflog expire --expire=now --all && git gc --prune=now --aggressive
Enumerating objects: 30778, done.
Counting objects: 100% (30778/30778), done.
Delta compression using up to 16 threads
Compressing objects: 100% (30054/30054), done.
Writing objects: 100% (30778/30778), done.
Total 30778 (delta 24677), reused 4478 (delta 0), pack-reused 0
~/active/cell/aicsimageio on main [?] env base Python v3.7.12 gcloud evamaxfieldbrown@gmail.comtook 9s
❯ git push --force
Enumerating objects: 3881, done.
Counting objects: 100% (3881/3881), done.
Delta compression using up to 16 threads
Compressing objects: 100% (1112/1112), done.
Writing objects: 100% (3881/3881), 6.82 MiB | 6.71 MiB/s, done.
Total 3881 (delta 2936), reused 3640 (delta 2744), pack-reused 0
remote: Resolving deltas: 100% (2936/2936), done.
remote: error: GH008: Your push referenced at least 20 unknown Git LFS objects:
remote: 54de2e71a92bdb440cd1cce476a9cd15ae42f57def6836a95d966e3be65ae628
remote: 1ea387a6eb3040fed7390ef8a6b8ba256002692827647062873ea68a24e86d9f
remote: df9ab243a43fe0681bf4548bf40d6769893aa08c50988af4ce2f40352a5b42b2
remote: ...
remote: Try to push them with 'git lfs push --all'.
To github.com:AllenCellModeling/aicsimageio.git
! [remote rejected] main -> main (pre-receive hook declined)
error: failed to push some refs to 'github.com:AllenCellModeling/aicsimageio.git'
Should I really be pushing LFS stuff???
I cannot wait to move over to an entirely new repo in bioio where we don't have LFS history.
There don't seem to be a whole lot of good answers to this online 😅 we could maybe ignore the commits that involve LFS? Or would it be best to just wait until we have a fresh repo?
Rename aicsimageio to aicsimageio-legacy. Start new aicsimageio history from current head. Problem solved!
certainly don't wanna cause any undue stress here :) so feel free to put this on the backburner if desired!
It's conceivable that the very first edition of bioio will be identical to aicsimageio but with the code separated into logical separate repositories and each reader repo would manage its own test resources. This could be a precursor to making the other intended improvements (e.g. making it easier to write a new Reader from scratch, improve some of the api etc..)
While that helps with the history/cloning problem, there is still the burden of managing potentially large stores of test resources. Especially if we consider tiff, and ome-zarr, to be "core" for bioio.
Closing due to the upcoming release of bioio
. aicsimageio
is moving into "maintenance" mode where only high impact bugfixes (or community contributed) work will be done in aicsimageio
. Instead of aicsimageio
, we are creating a package soon to be released called bioio. See the reason for this change here.
If this issue is still relevant to anyone (@tlambert03) feel free to re-open this issue.
i'm on a super slow internet at the moment and wanted to do a little work on aicsimageio. I tried to clone the repo and it took a long time... though the direct zip download was only 2MB (most of which are the presentations, the source itself is only 800K unzipped)
the full repo is 337 MB, and 333M of that is in
.git/objects/pack
... which i suspect indicates that at one point in the past, test images were included in the repo? I wonder how folks would feel about agit filter-branch
rewrite? https://docs.github.com/en/authentication/keeping-your-account-and-data-secure/removing-sensitive-data-from-a-repository