Closed cschwan closed 2 years ago
As a first step, we might want to simply consider every version released as file version 0. Starting with v0.6.0
we should explicitly write the file version.
Commit 3b88b4c115cefee125d8b9bbf9520e65eeda79de adds the upgrade
subcommand.
- As soon as a new file version is released all previous file versions should be considered deprecated, and at some point older versions can be removed. Backwards compatibility is ensured by the fact that crates.io always has all versions of the CLI, which we can use to
upgrade
grids in bootstrap kind of way (install the latest version that still supports the file format that needs to be upgraded, upgrade to most recent version supported, etc.). This could and should probably be automated.
You can think about to declare a fixed number of older versions always supported.
E.g. you can support just one older version, for which upgrade
is available to the newer one. Then, if there are grids that survive multiple releases without being updated, the user can always download the intermediate releases and do the upgrades one by one.
If occasionally is not overly complicated to maintain several older versions you do it, but it's not strictly required.
- As soon as a new file version is released all previous file versions should be considered deprecated, and at some point older versions can be removed. Backwards compatibility is ensured by the fact that crates.io always has all versions of the CLI, which we can use to
upgrade
grids in bootstrap kind of way (install the latest version that still supports the file format that needs to be upgraded, upgrade to most recent version supported, etc.). This could and should probably be automated.You can think about to declare a fixed number of older versions always supported. E.g. you can support just one older version, for which
upgrade
is available to the newer one. Then, if there are grids that survive multiple releases without being updated, the user can always download the intermediate releases and do the upgrades one by one.If occasionally is not overly complicated to maintain several older versions you do it, but it's not strictly required.
Yes, that's what I meant with 'bootstraping'. It's a well known problem for GCC, which needs a C++ compiler to build :smile:.
You're definitely right, then just keep going :)
Preliminary code to support file format changes are in commit d9897bcd632209022b0a35d38851f423fbf45b06. If you find yourself not being able to read new grids, make sure the CAPI/CLI/Python API is up to date.
Here's what I'd like to change in a newer version:
BinInfo
and BinRemapper
into BinLimits
. The reason for having them separate is historical only (saw above) and merging them will make parts of the code much easierMmv3
into Grid
, which means that metadata will always be present making metadata-related code much shorterOrder
's member from u32
to u8
and add another member to support #98Subgrid
types and keep only the ones we useJust to have an idea: can you tell which are the subgrid types we're still using, and where? I know a couple of them (maybe a bit more), but I'm definitely puzzled about the others...
Here's an overview:
EmptySubgridV1
: keep, this is needed to optimize empty gridsImportOnlySubgridV1
: removeImportOnlySubgridV2
: keep, is more general than ImportOnlySubgridV1
, supports grids where the facorization scale is different from the renormalization scaleLagrangeSubgridV1
: removeLagrangeSubgridV2
: keep, is more general than LagrangeSubgridV1
, supports DISLagrangeSparseSubgridV1
: remove, was never used; was supposed to give a better memory footprint than LagrangeSubgridV{1,2}
while filling the grid with a MC, but that was never a problem with Madgraph5NtupleSubgridV1
: remove, this saves N-tuples so that there's no interpolation error, but its space requirements make it unpracticalI'm closing this as support for the original is supported in v0.5.0. Further development should be discussed in #118.
Backwards compatibility
First, let's define backwards compatibility:
Grid::read
must be able to read all generated PineAPPL grids if they were generated using a released version of PineAPPL. Released versions are the ones on the Releases page.PineAPPL file format
PineAPPL doesn't have a dedicated file format, but instead relies on
serde
for (de)serialization and onbincode
for actually writing bytes to and from files. This has the disadvantage that, for the sake of backwards compatibility, everystruct
that has the#[derive(Deserialize,Serialize)]
attributes must never be changed ever, and the only flexibility is adding further kinds ofenum
s; that's the reason why there are multiple versions of a struct asV1
andV2
variants.Obviously requirements change and even in the design mistakes were/will be made. To mention a few examples:
MoreMembers
enum was added to support aBinRemapper
. This struct basically supersedesBinLimits
, which only supports one-dimensional distributions that are contiguous (the right bin limit is the left limit of the next bin).BinRemapper
supports, at least in principle, an arbitrary number of dimensions and also normalizations that are not necessarily tied to bin sizes. Yet another structBinInfo
is needed to abstract the differences between the two, as shown inGrid::bin_info
.MoreMembers
enum is needed for metadata, which was previously missing. As a result, the methodsGrid::key_values
,Grid::key_values_mut
return anOption
depending on whether theGrid
does have metadata or not.Planned changes
To make file handling more flexible and to support different designs without sacrificing backwards compatibility, we need to implement a few changes:
1) We need a file header and a file version. The file header precedes as the remaining data and can be as simple as the byte string
['P', 'i', 'n', 'e', 'A', 'P', 'P', 'L']
. This is needed to letGrid::read
detect if a grid can immediately be deserialized or if it has first to be decompressed. The file version, on the other hand, lets us determine exactly how the read is performed. 2) Depending on the file version,read
of the correct struct is called, followed byupgrade
which converts the grid from a specific version to the latest one. Theupgrade
method must also be offered by the CLI so that one can batch convert grids into the newest version. 3) At some point we might have different versions of theGrid
struct in the crate, possible aspineappl::grid::v0::Grid
,pineappl::grid::v1::Grid
as so forth, and a type definition forpineappl::grid::Grid
for the most recent version. 4) As soon as a new file version is released all previous file versions should be considered deprecated, and at some point older versions can be removed. Backwards compatibility is ensured by the fact that crates.io always has all versions of the CLI, which we can use toupgrade
grids in bootstrap kind of way (install the latest version that still supports the file format that needs to be upgraded, upgrade to most recent version supported, etc.). This could and should probably be automated. 5) To make this work, the supported file versions need to be documented, ideally in theupgrade
subcommand of the CLI itself as error messages (something along the lines oferror: tried to upgrade grid with file version 0. You need pineappl 0.5.0 to upgrade this version
).