PCMDI / cmip6-cmor-tables

JSON Tables for CMOR3 to create CMIP6 dataset
BSD 3-Clause "New" or "Revised" License
31 stars 46 forks source link

Adding a license file to this repo for conda-forge releases? #353

Closed chengzhuzhang closed 1 year ago

chengzhuzhang commented 2 years ago

@durack1 @mauzey1 Hey Paul and Chris, The E3SM data publication team is working on a solution to package external data needed for cmorizing E3SM data and release them as an independent data package. This way can significantly streamline our data publication workflow and make it more portable. To do so, we will need a license file to be included in [cmip6-cmor-tables](https://github.com/PCMDI/cmip6-cmor-tables) repo. A detailed discussion can be find here https://github.com/E3SM-Project/e3sm_to_cmip/issues/124. I'm wondering, would you please consider adding a license? Thanks a lot!

durack1 commented 2 years ago

@chengzhuzhang @xylar interesting question. In principle I have no problem adding a license, one question, what license flavour would make it easiest?

One reluctance I have in proceeding, while wrapping this repo into a conda package and automating its download will aid automated use, it further abstracts the requirement to register/update model/institution info to a user - it's not quite a set and forget process

chengzhuzhang commented 2 years ago

Hey Paul, Thanks for chiming in. Since both CMIP6_CVs and cmip6-cmor-tables repos are open source on GitHub, perhaps it might not be too concerned that more users would skip the WCPR registration requirement through a conda package? The conda package does add one more entry point to these tables though.

For a work-around, It seems like https://github.com/WCRP-CMIP/CMIP6_CVs and cmip6-cmor-tables share the same set of json files? The former has a license already. Maybe we should use CMIP6_CVs instead? (and I'm curious what's the difference between both?) Thank you.

durack1 commented 2 years ago

Hi Jill, the two repos are separate. WCRP-CMIP/CMIP6_CVs is the controlled vocabulary (and registration) repo for CMIP6, where institutions, models, experiments, MIPs, etc are defined. A subset of the registered information is pulled into the PCMDI/cmip6-cmor-tables/Tables/CMIP6_CV.json file, which allows a modeling group to use CMOR with minimum configuration, as the registered information is available for use by the software, packaged in the cmip6-cmor-tables.

The WCRP-CMIP/CMIP6_CVs license, is a default file license template for use in the netcdf global attribute license field, so isn't the kind of license that is required to create a conda-forge package (@xylar can correct me here if I'm wrong)

xylar commented 2 years ago

The WCRP-CMIP/CMIP6_CVs license, is a default file license template for use in the netcdf global attribute license field, so isn't the kind of license that is required to create a conda-forge package (@xylar can correct me here if I'm wrong)

I think that's correct. We need a license file that describes limitations (if any) on packaging and redistributing the contents of the repository. The license linked to on https://github.com/WCRP-CMIP/CMIP6_CVs is not that kind of license. It is a license the software adds to contents it produces.

chengzhuzhang commented 2 years ago

Hi Jill, the two repos are separate. WCRP-CMIP/CMIP6_CVs is the controlled vocabulary (and registration) repo for CMIP6, where institutions, models, experiments, MIPs, etc are defined. A subset of the registered information is pulled into the PCMDI/cmip6-cmor-tables/Tables/CMIP6_CV.json file, which allows a modeling group to use CMOR with minimum configuration, as the registered information is available for use by the software, packaged in the cmip6-cmor-tables.

The WCRP-CMIP/CMIP6_CVs license, is a default file license template for use in the netcdf global attribute license field, so isn't the kind of license that is required to create a conda-forge package (@xylar can correct me here if I'm wrong)

Hey Paul, thank you for the clarification. It seems one goal of this repo is to facilitate using it together with CMOR for modeling groups to create CMIP compliance files, then I think it makes sense to have conda release of this repository? The goal from E3SM side is to be able to port cmorization process easily to supported machines. And maintaining a conda package of this repo is a better approach than manually git clone and update this repo on different platform. @xylar and I will help maintaining the conda package if a license can be added. Thanks for your consideration!

durack1 commented 2 years ago

@chengzhuzhang in principle I have no problem with this, however, we'd need to know what license makes using a conda-forge archive easiest?

@mauzey1 do you see any problems with this?

chengzhuzhang commented 2 years ago

Thank you @durack1 ! According to general guidelines for opensource software I'm thinking either Apache 2.0, BSD or MIT license would be proper (I'm reading on this page https://software.llnl.gov/about/licenses/). @xylar would you agree that any of this license type should be sufficient?

mauzey1 commented 2 years ago

@durack1 @chengzhuzhang You mean having the CMIP6 CMOR tables' JSON files stored in a conda-forge package? Like a package that would install the directory of tables in /envs/my_env/share/cmip6-cmor-tables?

xylar commented 2 years ago

@mauzey1, yes, exactly. I have a recipe for doing that already here: https://github.com/xylar/staged-recipes/tree/add_cmip-cmor-tables/recipes/cmip6-cmor-tables

I'm thinking either Apache 2.0, BSD or MIT license would be proper (I'm reading on this page https://software.llnl.gov/about/licenses/). @xylar would you agree that any of this license type should be sufficient?

@chengzhuzhang, those are definitely common licenses on conda-forge so the would work for me.

mauzey1 commented 2 years ago

@xylar Okay, then I agree with creating a conda-forge package for cmip6-cmor-tables.

matthew-mizielinski commented 2 years ago

I can certainly see the utility of having a conda package for the tables, but when it comes to versioning these tables please have a good think about the labelling of the package and how that updates.

For Met Office work I have a set of versions that we use for CMIP6 production (01.00.29, 01.00.31, 01.00.32), but we update the CMIP6_CV.json file within these directories with the latest one when something important changes ( I recall doing this for fixes to experiment parent details and the introduction of a new model). You could choose a composite version number to represent the inputs (data request version + CVs version (+ CMOR version?)). In some ways it would be good to separate the MIP tables from the CVs, but this would likely be a bit disruptive.

One other thought; how often would a new conda package be published? Updates to the CVs still occur reasonably regularly with new models being added, although a bit of automation should cover this fairly easily.

xylar commented 2 years ago

@matthew-mizielinski, conda-forge can easily accommodate versioning like you suggest.

A bot can automatically create a new package each time there is a new version update as long as the recipe doesn't need to change (other than the version and sha256 hash of the files, which the bot updates). Maintenance on that side should be a piece of cake.

durack1 commented 2 years ago

@xylar @matthew-mizielinski thanks for getting into the weeds with this. We do capture the CV, DREQ and CMOR versions in the release comment (see here), but importantly, a new version is not released when the CVs change (which happens relatively frequently), so unless we changed that process, the conda package would never update with the latest CVs - requiring a manual step by a user, not ideal

xylar commented 2 years ago

How important would the CVs be for the tools that would typically use the proposed conda-forge package? It isn't necessarily a problem if that file gets updated infrequently, as long as that is clear to users of the conda-forge package.

durack1 commented 2 years ago

The path that a new modeling group is meant to follow, to use CMOR

  1. register their institution and model in the CVs (institution_id and source_id)
  2. this information is then propagated across to the cmip6-cmor-tables/Tables/CMIP6_CV.json file, and then a user can just select their information from the preconfigured/registered info to write files (in addition to configuring input files for CMOR use)
  3. as the information is registered in the CMIP6_CVs this then triggers downstream support, ESGF publisher, citation, ES-DOCs etc in a consistent way

So to aid a user, having the most up-to-date CMIP6_CV.json file would certainly be a necessity, otherwise hand-spun edits will be required which may break consistency with the registered information that other software expects

xylar commented 2 years ago

Okay, that's good to know. Why not do a release every time the CMIP6_CV.json file gets updated?

xylar commented 2 years ago

There is no way to do a conda-forge package from anything other than a release and it isn't a good idea to overwrite or edit files in a conda environment, since they might unexpectedly get overwritten by a later update, etc.

durack1 commented 2 years ago

@xylar there has been no need to (up until now), and our versioning tag doesn't account for the CV version, however, the tag/release comment does

xylar commented 2 years ago

Okay, I'll leave this to the rest of you to discuss. I know for e3sm_to_cmip, the current situation of each user cloning this repo is not working well. I'm happy to help with conda-forge packaging if that ends up being practical. But I don't understand enough of the subtleties or who the end users might be to weigh in on those details.

durack1 commented 2 years ago

@xylar a practical question from me, do you expect folks to run conda update on their env every time they go to use it?

xylar commented 2 years ago

That would be a question for @chengzhuzhang. It sounds like it might be worth including a conda update as part of the e3sm_to_cmip workflow to make sure the latest version of cmip6-cmor-tables is being used.

chengzhuzhang commented 2 years ago

Hmmm.. conda update would only help if the package is released frequent enough (i.e., each time user facing features are updated?). Now I learned more about the use of this repo. I understand for an existing registered modeling center, most changes of CMIP6_CV.json doesn't really impact, unless its own registered information is updated though.

durack1 commented 2 years ago

@chengzhuzhang yeah exactly, so when we updated E3SM1-0 to include the UCI institution_id, if you had cloned the repo after this was merged (and pulled across into the cmip6-cmor-tables repo) then you'd have no further tweaks to apply, assuming that you're also using the latest ESGF publisher which may validate your entries

chengzhuzhang commented 2 years ago

thanks for all the clarifications @durack1 , with this, I think without changing current release plan, a conda package may not be the best and most practical approach...

@xylar, I think there is another possibility: that is to manage data in this repo in the similar fashion as managing our analysis dataset. This is a less automatic approach, but we will perhaps have better control over schedules for updating these data file. We can talk more offline about this possibility.

durack1 commented 2 years ago

@chengzhuzhang, let's pause for a second and let me chat with @taylor13, @matthew-mizielinski and @mauzey1 to figure out if a conda package makes sense. To be honest, I like the ease of use this would enable, however, there are questions remaining in how we'd make sure things are kept up to date without causing more problems (and work) than it solves

xylar commented 2 years ago

@chengzhuzhang, another feasible approach to have e3sm_to_cmip clone this GitHub repo into some predefined scratch space (specific to a user, rather than shared) as the first step of running? That way, you would always have the latest version without the need for a release?

It sounds like my approach or yours are more feasible than a conda-force package.

Thank you everyone for the discussion.

durack1 commented 2 years ago

@xylar either way, adding a license is a trivial step, would CC BY-SA 4.0 cause you any issues in conda-forge land?

xylar commented 2 years ago

@durack1, CC BY-SA 4.0 should work just fine. I was able to find several existing packages with that license.

matthew-mizielinski commented 2 years ago

@xylar, just a thought, but I use a python class for something similar; cartopy.io.Downloader

@durack1, it wouldn't be too hard to write an api to retrieve tables and CVs with something like this downloader class. That would be something that could go to conda, and a simple function call like cmor_tables_location(table_version='01.00.33', cv_version='latest', destination='<somewhere>') would be able to provide access to any version of the tables.

xylar commented 2 years ago

@matthew-mizielinski, so I've given the downloader some thought. I think what would work well for our software (e3sm_to_cmip) is to have the conda package for most files cmip6-cmor-tables. This is preferable to a downloader in that all we have to do is make a specific version (or a constrained version) of the package a dependency of our software and the downloading happens automatically without any extra steps. But a simple download command for cmip6-cmor-tables/Tables/CMIP6_CV.json at the beginning of our process is a really good idea, so we don't rely on the (potentially outdated) version from the packag. Since that's a single file, it wouldn't require anything fancy like its own downloader, we could just use requests.

matthew-mizielinski commented 2 years ago

@durack1, @taylor13 -- this issue has popped up in another context. I've forked this repo to use for another project*, i.e. created a "derived work", and am looking to move that repository into the MetOffice organisation, but I'm being asked about the license status of this repository. As there is no declared license I think the guidance here applies;

You're under no obligation to choose a license. However, without a license, the default copyright laws apply, meaning that you retain all rights to your source code and no one may reproduce, distribute, or create derivative works from your work. If you're creating an open source project, we strongly encourage you to include an open source license. The Open Source Guide provides additional guidance on choosing the correct license for your project.

Note: If you publish your source code in a public repository on GitHub, according to the Terms of Service, other users of GitHub.com have the right to view and fork your repository.

This is a little confusing, so I was wondering whether it would be acceptable to explicitly apply an open license? The MIT or BSD-3 licenses look like an appropriate fit to me.

*The changes are only to the CVs file to allow an independent (activity, experiment) pair, i.e. not part of CMIP6.

mauzey1 commented 2 years ago

@durack1 @taylor13 Will we be okay with having the following license file in this repository?

MIT License

Copyright (c) 2022, Lawrence Livermore National Security, LLC

Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to deal
in the Software without restriction, including without limitation the rights
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
copies of the Software, and to permit persons to whom the Software is
furnished to do so, subject to the following conditions:

The above copyright notice and this permission notice shall be included in all
copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
SOFTWARE.

I'm not sure if we should be following the software license guide in https://software.llnl.gov/about/licenses/. I don't see an IM number for this repo.

durack1 commented 2 years ago

@mauzey1 after consulting the E3SM leads, a BSD-3 license has been assigned to the E3SM code base, and so this has also been assigned to this repo in #374 - if this is too restrictive, or MIT is preferred, we could update - comments @xylar @chengzhuzhang?

mauzey1 commented 2 years ago

@durack1 Yes, I think the BSD-3 license is adequate for CMOR's code base.

@xylar Should we update the BSD-2 license being used for the conda-forge build of CMOR to BSD-3?

durack1 commented 2 years ago

@mauzey1 great, thanks.

The difference between the 2-Clause BSD and 3-Clause BSD is:

3. Neither the name of the copyright holder nor the names of its contributors may be used to endorse
or promote products derived from this software without specific prior written permission.

Which is an additional clause that we would want to keep

xylar commented 2 years ago

@mauzey1 after consulting the E3SM leads, a BSD-3 license has been assigned to the E3SM code base, and so this has also been assigned to this repo in https://github.com/PCMDI/cmip6-cmor-tables/issues/374 - if this is too restrictive, or MIT is preferred, we could update - comments @xylar @chengzhuzhang?

A BSD-3 license is perfectly fine. Thank you for looking into this!

@xylar Should we update the BSD-2 license being used for the conda-forge build of CMOR to BSD-3?

It's certainly important that the license name on conda-forge matches the actual license being used of CMOR. If you change the license for CMOR to BSD-3, we should make sure to update accordingly in the next release build on conda-forge but not before (the current release is still BSD-2).

durack1 commented 1 year ago

It appears this was solved with https://github.com/PCMDI/cmip6-cmor-tables/commit/bd8f3dc082bf4396a18023c44fb85e9cbae66868 so will close - please reopen if there's something I missed