NNPDF / pineappl

PineAPPL is not an extension of APPLgrid
https://nnpdf.github.io/pineappl/
GNU General Public License v3.0
12 stars 3 forks source link

Release version v1 of the file format #118

Open cschwan opened 2 years ago

cschwan commented 2 years ago

Here's what I'd like to change in a newer version of the file format, v1, replacing the current (v0) one:

Originally posted by @cschwan in https://github.com/N3PDF/pineappl/issues/83#issuecomment-1007949008

cschwan commented 2 years ago

Technically we can have different file formats at the same time by versioning also the modules:

Grid::read would be the only method that has access to the old v0 module, which it uses if the file header indicates a non-current version. It then elects the appropriate Grid, either from v0 or v1. Grids loaded from v0 would automatically be converted into objects from v1, so everything should go smoothly without the user even noticing.

In this way we can also easily get rid of old file format - if we decide to do that - by simply throwing away the v0 module and changing Grid::read to return an error. In that case the user would have to manually upgrade the file format, for instance by using an older version of the CLI.

felixhekhorn commented 2 years ago

Since you want to guarantee backwards compatibility, you need to have both the old and the new code inside and also the converter (which could go in v0 since it can be thrown away together) ... (just a style comment: you maybe want to scope them a bit more then just pineappl, because this sounds like the whole library changes - e.g. pineappl::io or similar?)

cschwan commented 2 years ago

Yet another way (no idea how that would work) is to let the pineappl crate depend on an older version of itself, or separate that part into a different crate, for instance pineappl-upgrade.

alecandido commented 2 years ago

Personally, I don't care so much about splitting crates. But if the other is not obviously the good choice, I'd stick with one for simplicity.

The dependency on the older version is potentially a mess.

I've read in the Go ecosystem an advise for new major releases, that was to create a completely new module (in this case a new crate), and depend on the old one. Unfortunately, in Go is simpler, because of the way modules are addressed. In Rust is not so common, so it would appear messy. But given that we are talking about the file format, I would imitate inside the crate itself (essentially with ::io::v0 and ::io::v1).

cschwan commented 1 year ago

I think I'll first go with v0 and v1 modules, and after some time we might decide split off the v0 part into a separate crate.

cschwan commented 1 year ago

Yet another way (no idea how that would work) is to let the pineappl crate depend on an older version of itself, or separate that part into a different crate, for instance pineappl-upgrade.

This stackoverflow answer describes how to do it: https://stackoverflow.com/a/58739076/812178. That seems pretty straightforward.

alecandido commented 1 year ago

Consider that, going this way, you might have to maintain also the older version, in case of bugs preventing some kind of applications. It is possible, but not the easiest, for a project with a restricted number of maintainers.

In my opinion, I would try to keep the version management straight, and just increase the version number in new releases. Thus, I'm more in favor of pineappl::v0, pineappl::v1... (or pineappl::io::v0 or whatever...). When you have a new one, you deprecate the old at some point, and eventually drop.

cschwan commented 1 year ago

@AleCandido those two ideas aren't exclusive. You'd write

use pineappl_v0 as v0;

in pineappl/src/lib.rs and in pineappl/Cargo.toml:

[dependencies]
pineappl_v0 = { package = "pineappl", version = "0.5.8" }

In this way we'd preserve an older version in pineappl::v0, while the newer version would be in pineappl. The main difference to explicitly keeping the old copy in pineappl/src/v0/... is that we can't modifiy the old source, but I think this is an advantage. With the method described above we'd rely on crates.io to get 0.5.8, which no one needs to maintain (you can't change published versions).

The only functionality that we need from pineappl::v0 is to load a Grid and update the structures to those in v1.

alecandido commented 1 year ago

For as long as it is working, that's fine.

Problem is if Grid becomes incompatible, then you have to write a method to convert v0::Grid into Grid. However, if the only operation allowed with the old crate in the new crate is loading, that's just the worst case: you need a converter. Then, all the operations and everything will be done by the new crate with the new logic, even if incompatible.

The only room remained is if you are also going to dump an old version, or just the new one (I'd recommend the later). E.g., if you optimize an old grid, will it still be old? Or if you merge old grids? This is a potential mess, since then you also have to deal with merging new and old grids and so on, so I'd say all the operations should be performed as if the grids were all new, and the old ones are converted. Notice that this is potentially breaking: if your code depends on PineAPPL v0, and you optimize your grid with PineAPPL CLI v1, then you won't be able to use it any longer. However, I would say that this should be solved by the user either using CLI v0, or upgrading the dependency.

alecandido commented 1 year ago

Another topic: from what you said it seems you want to keep file version and library version synced

In this way we'd preserve an older version in pineappl::v0 [...] we'd rely on crates.io to get 0.5.8 [...]

Are you sure?

Definitely, it has the advantage of being more clear. But in principle, you can make an incompatible file format also maintaining compatibility of the library (it seems to me), and the opposite as well. Since compatibility should be maintained as far as possible, sharing the version you will occasionally declare as incompatible objects that actually are compatible.

cschwan commented 1 year ago

Problem is if Grid becomes incompatible, then you have to write a method to convert v0::Grid into Grid.

The changes in https://github.com/NNPDF/pineappl/issues/118#issue-1142868716 will require rewriting the Grid class, so in that sense a converter from a v0 to a v1 Grid will need to be written in Grid::read.

However, if the only operation allowed with the old crate in the new crate is loading, that's just the worst case: you need a converter. Then, all the operations and everything will be done by the new crate with the new logic, even if incompatible.

I don't think there will be incompatible logic; the logic shouldn't change, only the internal structure and the file format.

The only room remained is if you are also going to dump an old version, or just the new one (I'd recommend the later). E.g., if you optimize an old grid, will it still be old? Or if you merge old grids? This is a potential mess, since then you also have to deal with merging new and old grids and so on, so I'd say all the operations should be performed as if the grids were all new, and the old ones are converted. Notice that this is potentially breaking: if your code depends on PineAPPL v0, and you optimize your grid with PineAPPL CLI v1, then you won't be able to use it any longer. However, I would say that this should be solved by the user either using CLI v0, or upgrading the dependency.

We'll always guarantee backwards compatibility, but not forwards compatibility, so eventually you'll have to update your dependencies. But we'll only change the Rust API and add a new file format (v1), everything else (CAPI, CLI, Python interface, etc.) will stay the same. Basically, I will change a few things under the hood, so the car will look the same, although maybe it may be a bit faster and simpler :smile:.

cschwan commented 1 year ago

Another topic: from what you said it seems you want to keep file version and library version synced

In this way we'd preserve an older version in pineappl::v0 [...] we'd rely on crates.io to get 0.5.8 [...]

Are you sure?

Definitely, it has the advantage of being more clear. But in principle, you can make an incompatible file format also maintaining compatibility of the library (it seems to me), and the opposite as well. Since compatibility should be maintained as far as possible, sharing the version you will occasionally declare as incompatible objects that actually are compatible.

In an arbitrary version of PineAPPL (the Rust library) supporting the file format version x the function Grid::read will parse the file format number of the specific file that we want to read, let's call that p, and then we'll have three cases:

For each file version we just need to know the latest version of PineAPPL that supports it, and that will be stored in Cargo.toml as shown above.