dnbert / prm

PRM Allows you to quickly build package repositories, inspired by Jordan Sissels' FPM
MIT License
250 stars 33 forks source link

[Debian only] Re-releases of packages cause repo to be in a broken state #30

Closed mfoo closed 10 years ago

mfoo commented 10 years ago

My build server produces deb files from various repositories. Sometimes a build will get triggered with no version changes, so a deb with the same version and a different md5sum gets pumped out. This is my problem and not something prm should care about, but this build gets pushed to an apt server and the package index is updated by prm. prm then sees the file in its inbox, notices the filename is the same, and puts the md5sum from the first time the deb arrived (stored in md5-sums/debname.deb in the Packages file. This means my Packages file has a different md5sum to my deb file, causing apt clients to fail with a hash sum mismatch error.

The relevant code is here: https://github.com/dnbert/prm/blob/master/lib/prm/repo.rb#L98

There are three options:

I vote option 3. Any thoughts?

mfoo commented 10 years ago

Another thought: Currently this will break if people have a different version of the deb file in https://github.com/mfoo/prm/commit/dc49d98a9b9d9260a8df46c2c45e3f681ddc52ad, but it's going to need more of a refactor than I thought, as the move step happens before the md5sum validation step.

Currently in my branch the deb file rejection message gets printed out, the error code changes, but the deb file has already been copied in place by this time.

dnbert commented 10 years ago

Well, this is something I never thought would occur! I definitely agree that option 1 is not appropriate, it's a bug and should be treated as one.

The md5 caching functionality is actually one of the reasons why I started writing PRM. When working with continuous integration shops, I found that packages were an awesome way (for me) to deploy code quickly and efficiently but it requires a lot of package building and repository regeneration. Reprepro has severe problems with not acknowledging multiple iterations of packages in the same component, and apt-ftparchive is pretty dead slow. We were looking at 10-20 minute repo regeneration times, so md5 caching dropped that down to about 5 seconds. I guess that's a bit of a back story, but it's one of the reasons why I think it's important to have.

That being said! One of things we can do is force an override of the md5sum checking (--ignore-md5-cache). That wouldn't necessarily solve the problem, it would just force PRM to check the md5sum each time. This is a pretty easy add in.

Or, we can go with your option 3 which is also a possibility - this can probably just be done by looking for the file itself inside of your repository, which I don't think will be too much work.

Also, we can have PRM ignore and remove the old package as well.

Since this project is community driven, I feel like asking you what you want to see is always better than me saying some option is better. What would you like to see? :)

mfoo commented 10 years ago

That being said! One of things we can do is force an override of the md5sum checking (--ignore-md5-cache).

Possibly this would be a good idea, but really I would like to see an error when the sum itself changes. Unfortunately (as in my commit) this means md5sum checking would always have to be done.

Another option of achieving the same thing essentially would be to ignore duplicate file names in the initial copy and print a warning/return an error, not copy them in. This would fix my build server problem, but would probably have to be accompanied with a --force-update flag or some such which would need to both always copy files and always refresh the md5sum caches. The only issue I can see with this is if people's debs don't use the standard Debian package naming convention (which currently in prm would cause them to be broken the same way mine is, I believe, so perhaps this isn't a big issue!)

dnbert commented 10 years ago

So it sounds like something like this would be in order:

1.) Check if package name matches an already existing md5 cache filename 2.) If md5sum matches the package that we want to add, skip over the package and print a warning at the end of the Packages.gz generation 3.) add a --force-update option to allow an override of the pre-existing md5 cache filename

I think this will handle everything you want it to and add a bit more verbosity into awkward package handling.

mfoo commented 10 years ago

Agree. Although this is getting further down my priority list as I'm currently fixing the build server scripts to not duplicate version numbers and filenames. Edit: What I mean is, there's no rush to go out and implement this if nobody else has this issue!

dnbert commented 10 years ago

I looked through the logic a bit more over the weekend and it's going to be more difficult to do the former option. What I really need to do is refactor the caching logic so that there are more than one unique identifiers per package like a timestamp. What I want to do is cache each package in a json object and have it look like:

{ 'my_package_0.1.deb': { 'md5sum': '122233311331', 'timestamp': '2014-04-05:06:52:51'} } or something like that (is that even real json? :)

In any case, for temporary measures, I've added a --nocache flag into PRM here: https://github.com/dnbert/prm/commit/719b66148f92454ad0a5bad74842e6950ddc0b5c

It's not exactly ideal, but it will at least help I think

mfoo commented 10 years ago

Hi, sorry, completely forgot about this issue.

Yes, that's real JSON. If you're going to do this, please use an ISO 8601 format timestamp. Edit: clarification: timezones are important!