gomods / athens

A Go module datastore and proxy
https://docs.gomods.io
MIT License
4.41k stars 496 forks source link

Limiting the maximum disk usage #1899

Open delthas opened 11 months ago

delthas commented 11 months ago

I'd like to install a small package cache on a VM with limited storage, to speed up go get in my Golang CI jobs. The VM storage space is very limited: ~10GB. I'd like to tell athens to use disk storage, up to that limit. If it's reached, it deletes existing packages to make space for new packages (eg, least recently used, least frequently used, ...)

I'd typically see this as a config option next to the disk storage path.

I suppose I could also clean existing packages manually with a cron, but it's a bit more cumbersome, athens knows how much it stores and can clean packages just in time to make space for a new one.

matt0x6F commented 7 months ago

Athens would have to do a lot of work here that's kind of out of the scope of a Go Proxy. Would it be reasonable to suggest writing a daemon or CronJob that monitors that disk and removes files in the order you desire?

delthas commented 7 months ago

It could work, but ideally the cache should be based on LRU (least recently used), like most HTTP caches, etc. So that when space needs to be cleared for the package to be saved, the package that was requested (not saved) the longest time ago is deleted. With an external script, I would probably only be able to delete based on file mtime, so the least recently fetched. Meaning that a package that is fetched often will still get cleared regularly after a full cache rotation.

While Athens is a "Go Proxy", it's a also really a cache for Go packages, and having a fixed cache size, with a small logic to delete the LRU ones, is quite standard for caches.

As a first step, an external script could work, but I think that it would really make sense for Athens to have this kind of logic.

matt0x6F commented 6 months ago

Ah, these are good points. Theoretically we could probably attach some "access" metadata to the indexer. Then I could see having a subprocess that runs continually and removes entries from the index as well as the filesystem based on some threshold criteria.

ionrover2 commented 6 months ago

What you’re suggesting, i think, could be accomplished with an nginx proxy. I’m not the authority, but i think the intention behind Athens is to ensure longevity of modules than act as an LRU cache. The nature of unused modules getting purged seems counterpoint to the intention of the product. To me at least, I could be very wrong.

Why do you need Athens for this over something like nginx?

delthas commented 6 months ago

For public packages, setting an nginx HTTP reverse proxy in front of https://proxy.golang.org would probably work after some tweaks. Additional configuration would probably be needed to cache only source files (so basically, .zip) but not metadata; and to store requests on the disk, with a maximum size. At this point nginx would be a working alternative; although would require some thinking for configuring etc.

However for my use case, I would also like to cache packages of a large internal Gitlab (unreachable over the Internet). So I can't just proxy HTTP over proxy.golang.org; and ask proxy.golang.org to fetch the packages for me over Git. I really need a tool that fetches the packages over Git itself. Hence, Athens.

ionrover2 commented 6 months ago

I also work in an airgapped lab environment and have this same use case!

I have an instance of Athens running on the public internet that I use as a GOPROXY in order to get the packages I need.

Using a deployment that has the same dockerfile in my airgapped environment, I manually tar up the Athens storage directory to a flash drive and unpack it to the same spot on the airgapped side with decent regularity.

This has been great for my team as previously they were vendoring dependencies to dummy projects and then having to hand jam into the project they actually wanted. If you use the GOPROXY variable with the direct clause at the end, you can access your internal golang projects through the proxy in your airgappend environment or directly from their git repo.