NNPDF / pineappl

PineAPPL is not an extension of APPLgrid
https://nnpdf.github.io/pineappl/
GNU General Public License v3.0
12 stars 3 forks source link

PineAPPL file format and backwards compatibility issues #83

Closed cschwan closed 2 years ago

cschwan commented 2 years ago

Backwards compatibility

First, let's define backwards compatibility:

Grid::read must be able to read all generated PineAPPL grids if they were generated using a released version of PineAPPL. Released versions are the ones on the Releases page.

PineAPPL file format

PineAPPL doesn't have a dedicated file format, but instead relies on serde for (de)serialization and on bincode for actually writing bytes to and from files. This has the disadvantage that, for the sake of backwards compatibility, every struct that has the #[derive(Deserialize,Serialize)] attributes must never be changed ever, and the only flexibility is adding further kinds of enums; that's the reason why there are multiple versions of a struct as V1 and V2 variants.

Obviously requirements change and even in the design mistakes were/will be made. To mention a few examples:

Planned changes

To make file handling more flexible and to support different designs without sacrificing backwards compatibility, we need to implement a few changes:

1) We need a file header and a file version. The file header precedes as the remaining data and can be as simple as the byte string ['P', 'i', 'n', 'e', 'A', 'P', 'P', 'L']. This is needed to let Grid::read detect if a grid can immediately be deserialized or if it has first to be decompressed. The file version, on the other hand, lets us determine exactly how the read is performed. 2) Depending on the file version, read of the correct struct is called, followed by upgrade which converts the grid from a specific version to the latest one. The upgrade method must also be offered by the CLI so that one can batch convert grids into the newest version. 3) At some point we might have different versions of the Grid struct in the crate, possible as pineappl::grid::v0::Grid, pineappl::grid::v1::Grid as so forth, and a type definition for pineappl::grid::Grid for the most recent version. 4) As soon as a new file version is released all previous file versions should be considered deprecated, and at some point older versions can be removed. Backwards compatibility is ensured by the fact that crates.io always has all versions of the CLI, which we can use to upgrade grids in bootstrap kind of way (install the latest version that still supports the file format that needs to be upgraded, upgrade to most recent version supported, etc.). This could and should probably be automated. 5) To make this work, the supported file versions need to be documented, ideally in the upgrade subcommand of the CLI itself as error messages (something along the lines of error: tried to upgrade grid with file version 0. You need pineappl 0.5.0 to upgrade this version).

cschwan commented 2 years ago

As a first step, we might want to simply consider every version released as file version 0. Starting with v0.6.0 we should explicitly write the file version.

cschwan commented 2 years ago

Commit 3b88b4c115cefee125d8b9bbf9520e65eeda79de adds the upgrade subcommand.

alecandido commented 2 years ago
  1. As soon as a new file version is released all previous file versions should be considered deprecated, and at some point older versions can be removed. Backwards compatibility is ensured by the fact that crates.io always has all versions of the CLI, which we can use to upgrade grids in bootstrap kind of way (install the latest version that still supports the file format that needs to be upgraded, upgrade to most recent version supported, etc.). This could and should probably be automated.

You can think about to declare a fixed number of older versions always supported. E.g. you can support just one older version, for which upgrade is available to the newer one. Then, if there are grids that survive multiple releases without being updated, the user can always download the intermediate releases and do the upgrades one by one.

If occasionally is not overly complicated to maintain several older versions you do it, but it's not strictly required.

cschwan commented 2 years ago
  1. As soon as a new file version is released all previous file versions should be considered deprecated, and at some point older versions can be removed. Backwards compatibility is ensured by the fact that crates.io always has all versions of the CLI, which we can use to upgrade grids in bootstrap kind of way (install the latest version that still supports the file format that needs to be upgraded, upgrade to most recent version supported, etc.). This could and should probably be automated.

You can think about to declare a fixed number of older versions always supported. E.g. you can support just one older version, for which upgrade is available to the newer one. Then, if there are grids that survive multiple releases without being updated, the user can always download the intermediate releases and do the upgrades one by one.

If occasionally is not overly complicated to maintain several older versions you do it, but it's not strictly required.

Yes, that's what I meant with 'bootstraping'. It's a well known problem for GCC, which needs a C++ compiler to build :smile:.

alecandido commented 2 years ago

You're definitely right, then just keep going :)

cschwan commented 2 years ago

Preliminary code to support file format changes are in commit d9897bcd632209022b0a35d38851f423fbf45b06. If you find yourself not being able to read new grids, make sure the CAPI/CLI/Python API is up to date.

cschwan commented 2 years ago

Here's what I'd like to change in a newer version:

alecandido commented 2 years ago

Just to have an idea: can you tell which are the subgrid types we're still using, and where? I know a couple of them (maybe a bit more), but I'm definitely puzzled about the others...

cschwan commented 2 years ago

Here's an overview:

cschwan commented 2 years ago

I'm closing this as support for the original is supported in v0.5.0. Further development should be discussed in #118.