CLIMADA-project / climada_python

Python (3.8+) version of CLIMADA
GNU General Public License v3.0
324 stars 125 forks source link

Resources: Restructuring this Repository #48

Closed mmyrte closed 4 years ago

mmyrte commented 4 years ago

Both issue #46 (util functions) and #37 (config & constants) have referred to the topic of "where do we keep what". If we (or probably just @emanuel-schmid 😀) are going to package climada for distribution, we're going to need to reorganise the repository. According to the official docs, this is the structure a package should follow:

packaging_tutorial
├── LICENSE
├── README.md
├── example_pkg
│   └── __init__.py
├── setup.py
└── tests

_I'll use climada_python as the root / from here on. Also, this is just a proposal, forgive my absolute language._

/data, /script, and /doc/tutorial need to move into other github repos.

I think this topic is quite involved and needs a lot of reading up on the formats of packaging formats for pypi/wheel/setuptools and conda/conda-forge/conda-build. I only stuck my toes into it and would be glad to let @emanuel-schmid decide for all of us, because I imagine that none of us has the knowledge or the time to find best practices and implement them.

chahank commented 4 years ago

Good summary. I think we could have two repositories then. One for the package, and one for the rest. For me tutorials, papers etc. are in the same category.

tovogt commented 4 years ago

Thanks for the suggestions!

Edit: I moved the discussion about large files to a new issue (#56).

emanuel-schmid commented 4 years ago

Thanks for the suggestions and thanks for the list of large files!

mmyrte commented 4 years ago

My argument for moving big files away from /data etc. is to make the cloning of the repo more lightweight; this would also mean deleting large files from history, i.e rewriting it. I understand if you want to leave that particular stone unturned. We have the climada.ethz.ch domain/VM already, so we could use that to host (slightly) larger files.

The argument for keeping /tests is that we can move the testing code there - AFAIK, the CI would still run off of this repo instead of the package. I personally would appreciate the clearer separation of actual programme logic and tests, but if you want users to be able to validate the correct functioning of the software on their machines, then it's a no-go. In that case, I would consider it curteous to the user to only download an archive of test files if the tests are indeed executed.

tovogt commented 4 years ago

About the large files:

tovogt commented 4 years ago

This is a follow-up about the large files:

Here is a list of large files that are obsolete (i.e., can be safely removed because they have already been replaced), removing those reduces repository size by 110 MiB already:

Click to expand the list of obsolete files in the git history

``` script/applications/eca_san_salvador/San_Salvador_Risk-Copy1.ipynb climada/hazard/test/data/cropping_test_LS.tif dist/climada-0.0.1.tar.gz data/F101992.v4b_web.stable_lights.avg_vis.tif.gz climada/test/data/system/admin0.mat climada/test/data/GLB_NatID_grid_0360as_adv_1.mat climada/test/data/GLB_NatID_grid_0360as_adv_2.mat data/F152007.v4b_web.stable_lights.avg_vis.tif.gz data/F162007.v4b_web.stable_lights.avg_vis.tif.gz data/F182012.v4c_web.stable_lights.avg_vis.tif.gz data/demo/gdp2asset_demo_exposure.nc ```

Here is a list of the files larger than 3 MiB (except those in feature/supplychain) and the people who added them to the repository. Can those people please replace the files by smaller ones or give reasoning why the files have to be that large?

Here is a list of files where we can save some space without removing:

Addressing all of the above files will bring the repository size down to less than 130 MiB which seems acceptable to me.

However, permanently removing files from a git history changes a lot of commit hashes. That's why I think we should wait for all files listed above to be discussed and replaced in all currently active branches (most important: main and develop). Then we can make one single change that removes large files once and for all (e.g. using the tool https://rtyley.github.io/bfg-repo-cleaner/). After that, we can implement a much stricter policy for large files and we should be fine in the future.

sameberenz commented 4 years ago

Hi Thomas, thanks for the check on the files. Should we change the files and tests using them directly on develop branch?

tovogt commented 4 years ago

Hi Samuel, since we are preparing for a release, the develop branch should be handled with care. It will be frozen and merged into the main branch very soon. Double-check that your changes don't break any tests (including unit and integration tests!), preferably create a feature-branch and a PR in a first step.

emanuel-schmid commented 4 years ago

Thanks a lot for the compilation and the suggestions. Sounds like a sound plan to me.

tovogt commented 4 years ago

Please continue the discussion about the large files in the new issue #56. This issue here is more about changing some of the paths, maybe moving some parts of the repository to a new repository or to some other external location.

emanuel-schmid commented 4 years ago

Resolution At some point the tests should be moved out of the package directory in order to keep the installation as light weighted as possible. Apart from that the repository structure will stay as is for the time being.