Open fmigneault opened 1 month ago
Sounds good !
Wow! That sounds nice!
The CLI entrypoint has not been maintained with the same attention as the other parts of the package, if you have any suggestions to improve it, we would interested I think!
Personnally, I did not like the fact that xclim would manage I/O, it seems quite out of scope. But a good cli would need to do that, maybe better than how it is currently done. Would it make sense to split it off to a new xclim-cli
repo ?
Which kind of I/O management do you foresee? I expect the CLI to receive some URI (local or remote file) and pass it down to the relevant index to compute, similar to the snippet in the README. Would there be other manipulations for other operations?
Anything involving xr.open_dataset
and dask
/chunking is I/O management to me in this context. Setting up a client / configuring workers would also count towards these "out-of-scope" manipulations. We already do those and the cli
module is well isolated from the rest, so maybe my inquiétudes of the module growing and spilling over in the rest of the package are unjustified.
@fmigneault
I recently updated the Dockerfile "recipe" used in birdhouse images here: https://github.com/bird-house/cookiecutter-birdhouse/blob/master/%7B%7Bcookiecutter.project_slug%7D%7D/Dockerfile. If you want to use this as a basis for a Pull Request here, feel free.
Addressing a Problem?
There is a growing number of applications involving pre-processing Climate data to perform case studies analysis. These analyses are often involved in a larger workflow processing chain, that needs containerization and encapsulation of the dependencies required for each step. Sometimes, analyses try to combine Earth Observation with Climate data, leading to package conflicts. Other times, parts of workflow processing chains need to be dispatched to different locations to address different resource requirements, platform availability or data-access requirements.
An example of how Earth Observation + Climate data workflow could be addressed is by using Common Workflow Language (CWL) and OGC API - Processes (e.g.: https://github.com/crim-ca/weaver). This is actually considered in ongoing work in OGC Testbed-20 for GeoDataCubes. CWL + OGC API - Processes has also been discussed during the recent OGC 2024 Climate Services Code Sprint.
However, whenever a user wants to employ climate indices such as provided by
xclim
, they need to redefine their own Python environment and manage dependencies. They also need to figure out how to build docker images and publish them to container registries, which is not an easy feat for everyone. The scientific community would benefit from a pre-built docker image that could be directly pulled and employed in a processing workflow.Potential Solution
Provide an official
Dockerfile
with all relevant dependencies for climate indices analysis, and publish images built from it in a public container registry (DockerHub or directly on thexclim
GitHub container registry). The docker image would simply havexclim
CLI as its entrypoint to be ready to use directly.Additional context
This is something that will most probably be needed for OGC Testbed-20 for GeoDataCubes efforts. Therefore, I want to discuss the idea of adding globally to
xclim
rather than doing it only on my end.If such an image is provided, all platforms using Weaver (Ouranos PAVICS, CRIM Hirondelle, University of Toronto RedOak, ClimateData.ca) could potentially share a common
xclim
docker reference for larger and interoperable processing workflows.Contribution
Code of Conduct