cybozu-go / aptutil

Go utilities for Debian APT repositories
MIT License
124 stars 29 forks source link

token too long #49

Closed ohmer1 closed 4 years ago

ohmer1 commented 4 years ago

I was using apt-cacher-ng but like you, I found it was unstable and I had intermittent issues. I'm trying to use go-apt-cacher as a satellite cache server. Our main repository is hosted with aptly.

I commented the existing mapping found in your sample config file and added mine:

aptly = "https://aptly.example.com/ubuntu"

The cacher is working fine. The first time a client ask for a package, it downloads it from the main repository and cache it locally. When another client ask for the same file again, the cacher get it from the local cache and doesn't get it from the central repository. All fine!

Problem is when I stop the cacher and try to start again. I get this error:

2020-04-07T20:28:26.549038Z deploiement-ext go-apt-cacher error: "ExtractFileInfo(aptly/dists/focal/main/binary-amd64/Packages.gz): parser.Read: bufio.Scanner: token too long"

The only thing I found to "fix" that is to remove the "meta" directory content. But by doing that, this seem to invalidate my local cache.

Hsn723 commented 4 years ago

Currently looking into this. The root issue is that we are using bufio.Scanner with the default MaxScanTokenSize which is 64kiB. I can reproduce this in our test suite by adding a package with an unrealistically large extended description, which is technically not illegal.

This can be addressed several ways:

  1. Introduce our own hard-coded MaxScanTokenSize and call Scanner.Buffer with it upon the creation of Parser in NewParser
  2. Expose a configuration knob for user-defined maximum scan token size and call Scanner.Buffer to set it with the user-provided value
  3. Catch bufio.ErrTooLong and dynamically increase buffer size

Option 3 is not ideal as we run into the risk of the buffer growing infinitely large.

Option 2 gives the most control, but requires a lot of elbow grease since we'll need to change a bunch of function signatures to expose MaxScanTokenSize all the way to Cacher (and we might also need to touch Mirror since it also calls apt.ExtractFileInfo. Plus, it introduces a configuration knob which might seem obscure to the user since it exposes an implementation detail which may not necessarily be relevant to them.

Option 1 doesn't give any control, but has the advantage that it does not break any of the current function signatures. It is also the most lightweight to implement.

For the time being, I'm leaning towards option 1 with an internal MaxScanTokenSize of 1 MiB, which seems to be a reasonable enough maximum size. If the buffer size issue is found to be prevalent enough, we can always consider building upon option 1 to implement the other options.

ohmer1 commented 4 years ago

The fix works great, thanks!