CEMeNT-PSAAP / MCDC

MC/DC: Monte Carlo Dynamic Code
https://mcdc.readthedocs.io/en/latest/
BSD 3-Clause "New" or "Revised" License
27 stars 24 forks source link

Large files are accidentally Git-tracked? #217

Open ilhamv opened 4 months ago

ilhamv commented 4 months ago

Can we use this to remove unwanted files accidentally tracked in the git history?

jpmorgan98 commented 4 months ago

Look for __ptxcache__ files .o .ptx files specifically per @braxtoncuneo

clemekay commented 4 months ago

You can use du -sh * in a directory for a human-readable list of how large each item in the directory is. The large files all seem to be due to inf_shem361 examples, the answer.h5 and data .npz files.

Possible ways to handle that:

ilhamv commented 3 months ago

The plan is to replace the infinite medium 361-group problem with an infinite medium few-group problem (probably the 7 group c5g7 data).

ilhamv commented 3 months ago

The largest memory seems to come from

 68M    .git/objects/b3
142M    .git/objects/pack

Now I'm less sure if the ~4 MB 361-group data is actually the culprit. I'll try to use https://rtyley.github.io/bfg-repo-cleaner/ which may provide us with more info.

ilhamv commented 3 months ago

So,,,

Deleted files
-------------

    Filename                             Git id            
    -------------------------------------------------------
    Miniconda3-latest-Linux-ppc64le.sh | cdb26f99 (94.9 MB)
    analytic.zip                       | b3859ac8 (92.5 MB)
ilhamv commented 3 months ago

Now the .git/objects folder is 44M. More reasonable!

However, the next step is:

Finally, once you're happy with the updated state of your repo, push it back up (note that because your clone command used the --mirror flag, this push will update all refs on your remote server):

$ git push

At this point, you're ready for everyone to ditch their old copies of the repo and do fresh clones of the nice, new pristine data. It's best to delete all old clones, as they'll have dirty history that you don't want to risk pushing back into your newly cleaned repo.

Any thoughts? @clemekay @jpmorgan98

ilhamv commented 3 months ago

We may be able to reduce the size further when we remove the SHEM361 test problems and examples. I'll rerun the repo cleaner. Nevertheless, we still need to think about the final step of the cleaning I mentioned in the previous comment.

clemekay commented 1 week ago

Currently looking into whether we need to use the cleanup function or whether we can just delete these files.