Closed Sbozzolo closed 9 months ago
Indeed - the data stored is all required for the tests, however I agree the format and size as is is not sustainable. I have significantly reduced the size of test/test_data/H1L1_1247644138-1247645038.pickle
by simplifying the tests and storing smaller spectrograms/data products. The repo is 50MB and the LFS 370MB. However, when cloning the repo it is still large as the contents of py_die/.git/objects/pack
contain refs to the larger file. Do you have suggestions on how to reduce these? I've tried to look for ways to "purge" the repo of old files but haven't found an obvious solution. In any case, this is not an issue for users as the tests (and test data) are not a part of the release. Please let me know if you have suggestions!
To properly solve this issue you'd probably have to rethink how you do tests. I agree that most users won't be affected directly by this, so this is not a blocking issue.
The older files will stay with git to give you the ability to roll back to previous commits without losing information. If you wanted to reduce the size of the previously stored objects, what you could do is backport your change for the test with test/test_data/H1L1_1247644138-1247645038.pickle
to the first commit where that test was introduced, and rebase every following commit to use the smaller data. This is tedious and probably not worth it, so I'd advise against it.
I think that the repo is good as is for JOSS, but maybe you can leave this issue open and try to fix the root of the issue in the future.
I have a few ideas of how to rethink the tests but would take some effort - I had initially tried to have the tests generate the test data on the fly but this would create circular tests that are always going to be self consistent; I need to design a simpler way of checking each step of the pygwb
calculations (e.g., instead of checking that entire arrays are identical, just checking that their sum is, and things like that...). I'll add an issue for now as you suggest and then I can address this in a future release.
(the pickle-related issue was already open and is #48)
This repository is more than 800 MB. I think that the reason for this is that the pickle files are stored in the repo, and
git
cannot properly store the diff, so it has to store the entire file every time. A closer look shows that the filetest/test_data/H1L1_1247644138-1247645038.pickle
is the worst offender.This is not sustainable, every time you regenerate the pickle, the repository grows by
30-80 MB
.Stack Exchange might help mitigate the problem, but probably the long-term solution is to do something with the pickle files (e.g., is all the data stored needed for the test?).