denisecailab / minian

miniscope analysis pipeline with interactive visualizations
GNU General Public License v3.0
91 stars 36 forks source link

Repo is humongous! #255

Open sneakers-the-rat opened 11 months ago

sneakers-the-rat commented 11 months ago

what up phil, raymond and I are checking out what u gone and done to see how it lines up with what we wanna do, and on clone noticed bigg repo. github errored on trying to clone it a few times.

raising this bc the fix is relatively simple:

almost all of the space is taken up by version history and the demo movies:

    1.0 GiB [###################] /.git
  688.6 MiB [############       ] /demo_movies
   10.1 MiB [                   ] /demo_data
    1.6 MiB [                   ] /docs
    1.2 MiB [                   ] /img
  324.0 KiB [                   ] /minian
... rest of package

Most of the large objects in the git history can re safely removed:

git rev-list --objects --all | \
  git cat-file --batch-check='%(objectname) %(objecttype) %(objectsize) %(rest)' | \
  sort -nr -k 3 | \
  perl -ne 'm#^(\w+) blob (\d+) (.+)# or next; print "$1\t$2\t$3\n";' | \
  head -n 200 | \
  column -t -s $'\t'
b7e62f383f6df7235d50ae7209fab2b833c74ea4  104857600  demo_movies/msCam1.avi
d6725ed26eaf419726e9aa7fba4f29964005fa01  72203510   demo_movies/msCam4.avi
ca147a24e3dcee3a4a576b2bb902ff57bb9541f4  72203510   demo_movies/msCam5.avi
9c58b81703c7e7c36ec9704dc5ffc530abbd39b0  72203510   demo_movies/msCam9.avi
8c1b4219b66e721e38a32d67411c8f75fd77d916  72203510   demo_movies/msCam2.avi
6ea6583317ce39cf59784cc9f91dba151453d95f  72203510   demo_movies/msCam3.avi
4c5ea0464743402c1a9e5f039aa55cc265dd92e2  72203510   demo_movies/msCam10.avi
458f41193569942c9bd8c2fb44bb57c8f24e331e  72203510   demo_movies/msCam6.avi
3e8cb893c7700c401c6eb5d0108ee31db9a0f4b6  72203510   demo_movies/msCam1.avi
33f22661b9cdea53ea1e71078087e24bee2820f4  72203510   demo_movies/msCam7.avi
10d143608e484609a3297229b4e6ec63213d68c0  72203510   demo_movies/msCam8.avi
b899bb3412d6f94c96394edce19479073026091e  39204731   pipeline.ipynb
defa8138c18aceba909db1d86ca9a59414e82658  37819044   demo_movies/msCam1.avi
3b4375903b580bae4f3c1239bcede7b5dd5e5820  35133043   minian/test/test_movie_fixture/minian_mc.mp4
... 

The videos are huge because they're uncompressed AVIs. since these are just demo videos, u can go from ~700MB down to ~7 with default ffmpeg x264 encoding

for i in ./*.avi; do ffmpeg -y -i "$i" "${i}.mp4"; done

and then you can remove all the big files from the git history with git filter repo

git filter-repo --path olde/and/big/file.avi --invert-paths

or --paths-from-file if you make a file with a list of paths.

That is a destructive change, but in this case if u don't plan on going back and trying out old versions of demo movies then it's the "ok" kind of destructive change.

anyway normally I would PR this but you can't really PR a change in git history, so just describing process here

sneakers-the-rat commented 11 months ago

Daniel gave some verbal context for the use of uncompressed video in the repo - apparently there have been historical difficulties with compressed video. i was trying to run the demo pipeline to actually empirically test that but apparently .mp4 is explicitly unsupported and .mkv passes the format check but fails since mkv files dont encode frame count in their header.

I see there is already a pull request to calculate frame count using the duration and framerate https://github.com/denisecailab/minian/pull/246

this is ~sort of what OpenCV does anyway, except the PR can't handle variable frame rates: https://github.com/opencv/opencv/blob/dc0c59fdc655bb4e1e83e6e4f8c2c33352e2baa4/modules/videoio/src/cap_ffmpeg_impl.hpp#L1901-L1910

An easier thing to do would probably be to just pass count_frames to ffmpeg.probe, which since you do an ffmpeg conversion to raw in the iterator anyway would mean that you should be able to get rid of the extension restrictions and just raise if there is some problem reading.

syntax for that would be

ffmpeg.probe(fname, count_frames=None)

anyway we're looking around to see which Ca imaging package to build on top of and it would be cool if it was this one, we are not having a fun time with Caiman unfortunately :(. if y'all aren't maintaining this anymore i will just buzz off and close this <3