CovertLab / wcEcoli

Whole Cell Model of E. coli
Other
18 stars 4 forks source link

Proposal: Releasing Versions of the Whole-Cell E. coli Model #1061

Open U8NWXD opened 3 years ago

U8NWXD commented 3 years ago

Since I'm figuring out how to do releases of the whole-cell model, I took a pass at writing up a plan for future releases. Here's my proposal:

Releasing New Versions of the Model

We release new versions of the model to the WholeCellEcoliRelease repository whenever someone in the lab publishes a paper that requires unreleased model code.

Tracking Versions

Version Numbers

We use semantic versioning for our version numbers, except we drop the patch number. In broad strokes, this means that our version numbers take the form major.minor, for example 1.0. We can also specify pre-releases like 1.0-beta.1. For any versions that include breaking changes (i.e. if someone else wrote code that uses our public methods, that code should still work), we increment the major version. For all other (i.e. backwards-compatible) changes, we increment the minor version. New major releases will generally go along with papers, while new minor releases will usually contain minor bug fixes.

Commits and Tags

When we release the model, we usually squash all our commits into a single release commit. This keeps the commit messages in wcEcoli private. However, this does not mean have one commit per release. For example, we might add commits to fix bugs or update documentation without doing a new release. You also might want to split the release for your paper across multiple commits. For example, if some of your data were generated using an earlier version of the model, you might want to include 2 commits: one that includes changes up to that earlier version and one for the rest of the changes. That way, you can refer to your versions my commit hashes in your paper.

Instead of tracking versions with commits, we track them with tags. These tags are named with the version number, e.g. v1.0 and associated with releases on GitHub. It's good to include in the tag message a description of what the release is for. Then, you can specify the tag in your paper. Tracking versions with tags has a number of benefits:

  1. Tags are more user-friendly than commit-hashes since they are human-readable.
  2. Releases, which are built from tags, are easily accessible with GitHub's web interface.
  3. When users clone the repository, they'll get the most up-to-date code by default, including any fixes we made since the last release.

Pre-Releases

When submitting a paper for review, you might want to make code available to reviewers without making a new release. For example, you might want to address reviewer comments before you make a new release. To handle this, we create pre-releases. These are versions just like those described above, except they have alpha or beta added to the end to signal that they are not yet complete. For example, let's say you're making a big new release that will be v3.0. You could create v3.0-beta.1 and make that available to reviewers. Then, you could address their comments in v3.0-beta.2. Once the paper's accepted and you've made any last changes, you can release v3.0. When you create the releases for v3.0-beta.1 and v3.0-beta.2 on GitHub, you can specify it as a pre-release so that GitHub marks it as such. This will tell users you aren't ready for them to use it yet.

If you want to avoid putting your code into the WholeCellEcoliRelease repository until after review, you can create a new temporary repository just for reviewers. One easy way to do this is to clone the WholeCellEcoliRelease repository and add the temporary repository as another remote. Then you can set up your tags and push to the temporary repository. Once the paper is accepted, you can push to the WholeCellEcoliRelease repository to make your releases public.

Other Considerations

1fish2 commented 3 years ago

Well said, @U8NWXD!

Goals

The primary goal is to allow people to run the code to reproduce the published results.

The secondary goal to allow them to read, understand, and tinker with the code.

Much has been written about program reproducibility and it's far from a solved problem esp. with floating point math. Frankly Python and its libraries aren't built with this in mind. Do what you can to increase code reproducibility.

What we've done to date

Release Procedure

This seems like a good pattern for a new release for a new published paper:

[Any changes or additions to this procedure?]

Assumptions

Pre-Releases

Changelogs

Changelogs are very useful but when making a new snapshot associated with a new published article, maybe we can settle for a high level summary.