fjahr / kartograf

MIT License
8 stars 7 forks source link

rpki-client dependency management #16

Open fjahr opened 2 months ago

fjahr commented 2 months ago

We rely on rpki-client for the accuracy of the rpki data and it is working well for that. rpki-client is also under fairly active development and they have a new release every ~2 months or so, which is also great. However, that makes it a bit tricky for us to manage this because any change that rpki-client does could hurt our reproducibility requirement. rpki-client doesn't care much about consistency in results, anything they change is an improvement for them. This means we could think we have differing results in the collaborative run when it was actually just the different rpki-client version causing it. This means we should try to have everyone participating in the run to be on the same version of rpki-client and if someone wants to reproduce the result later they will need to be on the same version. This can be tricky too because rpki-client only keeps the latest version available in package managers but that's not something we can fix and the old builds are available.

The most difficult part is the consistent effort to keep up-to-date with rpki-client, i.e. always test the latest version rpki-client version to ensure that it doesn't break kartograf. As a first step I have a small script that runs daily and creates a new issue here if there is a new release for rpki-client. The issues look like this: https://github.com/fjahr/kartograf/issues/15

jurraca commented 2 months ago

Thanks for creating that script. We provide the nix shell to encourage/highly recommend every run happens on the same rpki-client version. Not sure we're going to achieve much more than reminding people to update their env before running kartograf. Do you plan on storing the data for each run somewhere? If we want a tight grip on reproducibility, then we can always process the data with different versions of the client.

fjahr commented 1 month ago

Thanks, yeah, that's a good point. But I do think that we can not require every participant to run nix in order to participate. I am still struggling to adopt it in my workflows myself. And for anyone not using nix we have the issues described above, the version people can get comfortably via their package manager will be the latest version.

Storing the data was my plan initially but it's a bit annoying. I will check if I can get some sponsored hosted data store. But I still would like to rely on re-runs as much as possible. It introduces some friction and it has already been hard to get the group of participants together in the past.

I need to think about this a bit more but in an ideal world every setup would "just work" as long as rpki-client with the right version is available. Maybe it could be helpful for the user to have an additional command that checks if everything is available that is needed before running. We have seen several times that people only noticed afterwards that they didn't have the right versions of kartograf or rpki-client even though we have the checks and logging in the beginning of the run command.

fjahr commented 1 week ago

I didn't see this before but there is a "packaging status" table in the readme now at rpki-client-portable: https://github.com/rpki-client/rpki-client-portable

jurraca commented 1 week ago

nice. I could add it to nixpkgs.