aptly-dev / aptly

aptly - Debian repository management tool
https://www.aptly.info/
MIT License
2.56k stars 374 forks source link

"file already exists and is different" after un-publishing package from S3 #1223

Closed sf-nwaller closed 8 months ago

sf-nwaller commented 1 year ago

In specific circumstances, replacing a package with different content can result in this error message:

unable to update: unable to process packages: error putting file to pool/main/a/REDACTED.deb: file already exists and is different: S3: ca-central-1:REDACTED/

The specific circumstances are:

Steps To Reproduce

  1. run aptly api serve
  2. add a package to local repo
  3. publish repo to S3
  4. remove package from repo
  5. publish repo to S3
  6. alter the package so that it has a different checksum, making sure to not to change the filename or any part of the triple (Name, Version, Architecture) in the control file. The simplest modification is to change the maintainer in the control file.
  7. add the altered package to local repo
  8. attempt to publish repo to S3
  9. 💥 observe the "file already exists and is different" error message as shown above

Expected Behaviour

It should be possible to withdraw a package and re-publish it with different content without using ForceOverwrite.

Actual Behaviour

If Aptly is running in api serve mode and is configured to publish to S3, the Steps To Reproduce cause this error:

unable to update: unable to process packages: error putting file to pool/main/a/REDACTED.deb: file already exists and is different: S3: ca-central-1:REDACTED/

This fails because Aptly's pathCache still holds the MD5 sum of the original file, even though the file is no longer present in S3.

Works After Restart

If Aptly is restarted after removing the original package from S3, then publishing the changed package succeeds without error. This works because Aptly starts with an empty pathCache.

Works With ForceOverwrite

If "ForceOverwrite": true is included with the final publish operation, Aptly succeeds at publishing to S3, as expected.

Discussion

The Debian documentation on repository format says:

A repository must not include different packages (different content) with the same package name, version, and architecture.

The Aptly documentation on duplicate packages has this to say:

When aptly is publishing repository, it would give an error if conflicting package files (same name, but different content) are put together in one package pool. Package pool is shared by all published repositories with the same component and prefix. The same applies to switching snapshots or updating published repositories: if previous state and new state contain two conflicting packages, aptly would give an error. If you’re completely sure that this update operation is correct, you can use flag -force-overwrite to disable check for conflicting package files.

However, this issue is describing a different scenario where the package has already been removed. In that case, the repository does not contain simultaneously conflicting packages, and there are no conflicting package files in the pool.

Context

It's not possible to add/publish any other packages when Aptly is in this state. As long as Aptly has an in-memory cache of the MD5sum of an object that is different from the local pool version, regardless of the fact the object doesn't actually exist in S3, it will refuse to proceed with any publish operations.

Possible Implementation

I've tested this patch and it resolves the problem for me.

diff --git a/s3/public.go b/s3/public.go
index b44c7fe6..a13204f4 100644
--- a/s3/public.go
+++ b/s3/public.go
@@ -220,6 +220,7 @@ func (storage *PublishedStorage) Remove(path string) error {
                return errors.Wrap(err, fmt.Sprintf("error deleting %s from %s", path, storage))
        }

+       delete(storage.pathCache, path)
        if storage.plusWorkaround && strings.Contains(path, "+") {
                // try to remove workaround version, but don't care about result
                _ = storage.Remove(strings.Replace(path, "+", " ", -1))

Your Environment

neolynx commented 9 months ago

Hi !

Thanks a lot for the analysis and the patch ! I am not sure this completely solves the problem.

For example, then creating, deleting and creating the same s3 publish point again, no deb files are published.

Example

  1. Publish a snapshot to s3:

    curl -X POST -H 'Content-Type: application/json' --data '{"SourceKind": "snapshot", "Sources": [{"Name": "my-snapshot"}], "Distribution": "stable", "Signing": { "Batch": true, "GpgKey": "key@aptly", "PassphraseFile": "secret" }, "AcquireByHash": true, "ForceOverwrite": true }' aptly:8080/api/publish/s3:bucket
  2. Delete the s3 publish

    curl -X DELETE aptly:8080/api/publish/s3:bucket
  3. Recreate with same command above

Result: no files published

Expectation: same files published

The reason is, the pathCache is not recreated when just removing paths from it.

Question

In your scenario, if you publish the same file again (without restarting aptly), does it show up in S3 ?

Possible solution

I added the following at the end of the RemoveDirs function:

+        // Invalidate path cache
+        storage.pathCache = nil

Maybe this is needed in every scenario, where aptly serve is not restarted compared to running aptly commands?

sf-nwaller commented 8 months ago

Awesome, thanks! 🚀

I re-ran my local test scenario using the latest build aptly=1.5.0+82+gd060ad72 and confirmed that I'm no longer able to reproduce the bug. I can confirm this fix. ✅