NeurodataWithoutBorders / nwb-project-analytics

Repository for collecting analytics and scripts related to the NWB project.
Other
1 stars 1 forks source link

repo is big #32

Closed bendichter closed 6 months ago

bendichter commented 10 months ago

This repo is quite big, and it appears this is due to caching the results in the data folder. This is causing a problem for me now because I am having trouble cloning the repo, which is necessary to install this repo, which is necessary to run the html generation for nwb-overview. Would it be possible to cache these results somewhere else? Perhaps we could store them in a separate GitHub repo. Then we'd need to mess with the git commit history of this repo to reduce the size.

oruebel commented 10 months ago

This repo is quite big, and it appears this is due to caching the results in the data folder.

According to git, the repo is really only 39.69 MiB large, which usually shouldn't be an issue. On my home WiFi it seems to take ~10s to clone the repo. Does this issue persist or was this maybe a temporary network issue? Just for comparison, PyNWB is 50.40 MiB. Here the repo size stats according to git.

git count-objects -vH
count: 0
size: 0 bytes
in-pack: 1616
packs: 1
size-pack: 39.69 MiB
prune-packable: 0
garbage: 0
size-garbage: 0 bytes

Would it be possible to cache these results somewhere else?

Sure, we could cache the result in principle in any network-accessible store, e.g., a git repo or a folder in GoogleDrive. I'm not sure that would solve the issue, since the cached data is necessary to build the pages, i.e., we would still need to download the data. While the code will automatically calculate the data if it is missing, re-building the data takes ~45min. I'm not opposed to moving the cached data, but it's not a trivial task in terms of the time required to make all the necessary updates to the workflows and the git repo, so I want to make sure this change is necessary.

which is necessary to run the html generation for nwb-overview.

I think if you comment out the call to build_project_analytics in line 76 in conf.py then you should be able to build nwb-overview without the analytics. I think we could make it configurable whether to build the analytics or not.

oruebel commented 10 months ago

Just for reference in case we decide to move the cached data, here a description of steps to clean the files from git:

https://docs.github.com/en/authentication/keeping-your-account-and-data-secure/removing-sensitive-data-from-a-repository#avoiding-accidental-commits-in-the-future

oruebel commented 6 months ago

Closing this issue for now.