Closed mmyrte closed 2 years ago
Thanks, very nice overview @mmyrte!
In accordance with your classifications, I'd think
from climada.util.constants import SYSTEM_DIR
TOT_RADIATIVE_FORCE = os.path.join(SYSTEM_DIR, 'rcp_db.xls')
"""© RCP Database (Version 2.0.5) http://www.iiasa.ac.at/web-apps/tnt/RcpDb.
generated: 2018-07-04 10:47:59."""
ISIMIP_NATID_TO_ISO
and hard-coded dictionaries, which are long, crowd both the module & would crowd the constants-file, and are used only by one module: this is really the case where I'm wondering to move them to a yaml (or any other file format which is convenient for storing such data, yes, one file for each one), give them a clear name and move them to a /resouces sub-folder in the hazard, exposure, measures, engine folders. This is debatable, since we don't have such a folder structure right now. Alternatively, could also be utils/resources/all those files (wouldn't recommend, though).
An example which I'm not referring to in this suggestion (because it's "short enough"):
BM_YEARS = [2016, 2012] # latest first
# Years with GPW population data available:
GPW_YEARS = [2020, 2015, 2010, 2005, 2000]
General things to keep in mind:
More general things to keep in mind:
Less general things:
Thanks for the feedback! What do you think of the following proposal?
Constants and Configuration
When developing for CLIMADA, please distinguish between constants, i.e. parameters that are not supposed to ever change, and configuration parameters, where we provide the user with a sensible set of defaults, but which should be able to be overridden. A few examples of reasoning:
Numerical constants to decode files (e.g.
-999 = NaN
) should be kept in the module that implements the reader because…
- if the file format changes, it shouldn't be up to the user to fix it through a configuration, but up to us, since we would probably want to fix it for us a well.
- there will not be a second function relying on the constant, since this indicates a need for abstraction anyways.
The location of a test file should be declared in its corresponding unit test because...
- the file's primary purpose is to ensure the numerical reproducibility accross commits/environments and should not ever be changed.
- it is logically tied to the unit test. Integration tests should import the definition from the unit test, since they refer to several such test files.
The location of an online database should be declared in the settings since…
- a user may want to point it to a different version of the data.
The location of a demo file should be declared in the settings since…
- the software doesn't rely on these data, hence making them subject to change.
The location of a template required by a writer function should be declared in the settings since...
- a user may wish to adapt the template for their own use. (Caveat: No numerical index logic allowed.)
Regarding the location of the config, the package itself should contain a defaults.conf
, but to allow for it to be overridden first by $HOME/.climada.conf
and then by $PWD/climada.conf
or $PWD/.climada.conf
. I think environmental variables are not transparent enough for a majority of the userbase.
I would also aim to abolish the constants.py
, since constants make sense through their context. The danger of abusing it as a config file seems too great to me.
Are there any other uses of "constants" in climada, which are not yet mentioned in your overview?
Not that I know of, but please check out the overview below if you see any other usecase. I've excluded TESTS
and LOGGER
.
Index-based mappings […] are long […] and are used only by one module: […] move them to a /resouces sub-folder in the hazard, exposure, measures, engine folders.
This is a problem indeed; in my experience, this is basically used to clean up messy data. I agree that they have no place in the constants file and are better suited to some sort of config (and therefore one single folder like climada/resources
). The index logic is very vulnerable to changes in the files anyways, so you'd want a proper key-based mapping.
In the case of the natural earth dataset, I would even propose to fork their current version (they don't iterate much) and insert/correct the metadata we need directly into the dataset, then pull the data from our own repo instead of their official mirror.
Discussion that happened via e-mail; added for the sake of completeness:
@mmyrte : As I suggested on the github issue on constants, local config files should follow (at least on *NIX) the convention of having a system-wide config in /usr/local/etc/climada.conf, superseded by first ~/.climada.conf and then $pwd/climada.conf - these could then be used to indicate a local "scratch directory" for climada. I personally am more in favour of having ~/climada then .climada, seeing as people will probably also want to access these files from e.g. GIS software instead of being invisible to the average user.
@emanuel-schmid : These are my objections: The system wide configuration in /usr/local/etc/climada.conf seems a bit odd to me. Firstly, that’s not really a canonical path. As far as I can remember, I have never seen a single file there. And secondly, what’s the point in a system wide configuration when it’s not even possible to install climada outside of a virtual environment? I think we can skip it and just have these: first place to look at is pwd, then home, at last default immutable config in the package folder. Having different names for config files as you suggest, climada.conf in pwd .climada.conf elsewhere is actually a pattern that is quite commonly used. But, frankly, I always found it annoying. Why not have one name and stick to it regardless of the location? The leading dot of the suggested data directory .climada makes it arguably somewhat harder for the less savvy user to access their files, agreed. But – it has also its advantages: it is a sign that the path is not arbitrarily chosen but embedded in some context and that one better don’t move the thing around carelessly. So, please convince me, I’m happy to hear counter arguments.
To add my reply from that mail thread:
Point taken that it's probably only going to be installed in a virtual environment. […] No global config outside the package then. (I guess
/usr/local/etc
is primarily used for BSD systems in that case...) The rationale for naming it.climada.conf
in the homedir would be to allow technically savvy users to have a baseline config. Since a per-project configuration in a project folder should also be accessible to people who don't know about dotfiles, I think allowing both$pwd/.climada.conf
and$pwd/climada.conf
might be sensible. The rationale for having a visible global data directory to me is that there are going to be significant amounts of geodata in there; again, assuming that most of the students from the course don't know about dotfiles and dotfolders, they might be confused about what's eating up potentially gigabytes of storage. I've only ever seen dotfolders used for small configs and templates and such. If the folder were indeed moved, the programme in general is robust enough that necessary files are simply downloaded again. […]
Gabriela implemented the config reader so that a climada.conf
in the working directory gets read; fallback is climada/conf/defaults.conf
. Maybe we can just leave it at that. I imagine that if a configuration is used, it'll be on a project basis. Maybe I'm wrong in this, but I imagine that not many people have more than a handful of concurrent climada projects on their machine.
There is a new feature branch 'config_conventions' introducing a change in the configuration policy of CLIMADA, particularly addressing external resources and file locations.
Principles
@emanuel-schmid I think we may close this issue, no? If there needs to be more discussion about specificly migrating constants to settings, that may be better placed in a new issue.
TL;DR:
climada.util.constants
could (mostly) be migrated toclimada/conf/defaults.conf
; this thread should resolve whether we should do this.We got to talking about the proper place for storing constants and configuration aspects at today's coding convention meeting; I offered to summarise and start a general discussion here:
Distinct use cases
At least those that I know of, and that were discussed.
Reader support, e.g. through constant dictionary mappings: https://github.com/CLIMADA-project/climada_python/blob/b9d0c49bef93e224c2191eb448d97d85ba55fbe3/climada/hazard/base.py#L69-L72
Unit conversion, e.g. through floats: https://github.com/CLIMADA-project/climada_python/blob/91cfc25370ff84631fe69e85f9a4a6b74c7d0fa9/climada/util/constants.py#L174-L175
Index-based mappings, e.g.
ISIMIP_NATID_TO_ISO
https://github.com/CLIMADA-project/climada_python/blob/91cfc25370ff84631fe69e85f9a4a6b74c7d0fa9/climada/util/constants.py#L148-L150Resource locaters, local or remote, e.g. relating to IBTRACS https://github.com/CLIMADA-project/climada_python/blob/91cfc25370ff84631fe69e85f9a4a6b74c7d0fa9/climada/hazard/tc_tracks.py#L70-L74
Options
I may have missed something in the meeting, and there are of course other options. Please mention them.
What should we do?