Open JQFonseca opened 4 years ago
I believe the data that we use in the example notebook is now contained within the tests
folder and is small compared to the old example data. The old example data is still in the repo in the example_data
folder though and I don't think we use it (@mikesmic can confirm). If we remove that, the repo will be ~20mb which I think is reasonable.
Were you downloading though GitHub Desktop by the way? I have found that to be very slow for some reason which is not directly related to repo size. It sometimes takes a long time on a fast connection.
The example notebook now only using the data from the tests directory which contains 8.7MB of data - 4.9MP is a ctf file which we should maybe make smaller as it's not used in the example notebook. I will delete the example data directory in develop (I thought I had already done this tbh) which will cut out 36MB (60-70% of the total size)
It would be great to work towards having a library of example datasets, defined with consistent filenames and formats to automatically pull into a notebook.
This is still an issue. Cloning downloads 321.29 MiB of data. What's being downloaded? Does cloning include the whole history? Any ideas @merrygoat ?
Yes, the hidden .git folder has all of the historical diffs - you should be able to check this by doing a shallow clone:
git clone -–depth [depth] [remote-url]
Where depth is the number of diffs to fetch.
You can use git filter-branch
to edit history and remove the files but do be careful, editing the history is slightly dangerous. Best to do locally first and ensure you are happy before a force push.
Publishing on PyPI is a better approach than fiddling with the git history, isn't it?
I didn't think of it like that, but yes certainly.
I had a look at finding big files and deleting them from the history, I found a decent guide (https://web.archive.org/web/20190207210108/http://stevelorek.com/how-to-shrink-a-git-repository.html) but it scares me. I will publish to PyPI for now, it's daft that I haven't done that yet
I installed DefDap on a different computer today and it took so long tod download everything, primarily because the example data (which is needed) is relatively large. Could I suggest we move it to a repository, like Zenodo and then have a command to download it in the example notebook?