Open andrewdavidwong opened 8 years ago
It's indeed a common problem when deploying fedora vms/containers, or with server farms. Debian has apt-cacher(ng) but fedora doesn't have something similar.
Solutions that came up:
Anyway, instead of having specific tools for each distro it would be wiser to have a generic solution. So - all in all, the squid solution may be the best one, with cache misses rate being something to investigate.
Actually apt-cacher-ng works for Fedora too :) Maybe we can simply use it instead of tinyproxy as update proxy?
apt-cacher-ng works on fedora for mirroring debian stuff, but does it really work for mirroring (d)rpms/metadata downloaded with yum/dnf ?
From the doc [1]: "6.3 Fedora Core - Attempts to add apt-cacher-ng support ended up in pain and the author lost any motivation in further research on this subject. "
[1] https://www.unix-ag.uni-kl.de/~bloch/acng/html/distinstructions.html#hints-fccore
Yes, I've seen this. But in practice it works. The only problem is dynamic mirror selection - it may make caching difficult (when each time different mirror is selected).
Best Regards, Marek Marczykowski-Górecki Invisible Things Lab A: Because it messes up the order in which people normally read text. Q: Why is top-posting such a bad thing?
Marek Marczykowski-Górecki:
Actually apt-cacher works for Fedora too :) Maybe we can simply use it instead of tinyproxy as update proxy?
Can it also let through non-apt traffic? Specifically I am wondering about tb-updater.
Can it also let through non-apt traffic? Specifically I am wondering about tb-updater.
That's interesting question - if you have apt-cacher-ng instance handy, it worth a try. Anyway it has quite flexible configuration, so probably doable.
Best Regards, Marek Marczykowski-Górecki Invisible Things Lab A: Because it messes up the order in which people normally read text. Q: Why is top-posting such a bad thing?
I don't think there is a generic solution that works at the same time well enough for both, Debian and Fedora based. Why do we need a generic all at once solution anyhow? Here is what I suggest:
What do you think?
@marmarek
Can it also let through non-apt traffic? Specifically I am wondering about tb-updater.
That's interesting question - if you have apt-cacher-ng instance handy, it worth a try. Anyway it has quite flexible configuration, so probably doable.
I've read all the config, and tired, does not seem possible but never mind as per my above suggestion.
It will require more resources (memory), somehow wasted when one use for example only Debian templates. But maybe it is possible to activate those services on demand (socket activation comes to my mind). It will be even easier for qrexec-based updates proxy.
@adrelanos
Why do we need a generic all at once solution anyhow
I'm all for 100% caching success rate with a specific mechanism for each distro, but do Qubes developpers/contributors have time to develop/support that feature ? If yes, that's cool ; otherwise, a solution like squid would be easy to implement, and since it's distro agnostic it will help not only the supported distros (fedora, debian, arch?), but also other distributions that users install in HVMs (even windows then). The problems/unknowns with squid are the cache miss rate, the cache disk usage in order to minimize those, and the use of different mirrors with yum (although I find out that I usually always connect to the same one).
I'm using polipo proxy => tor to cache updates. I also modified the repo configuration to use one specific update server instead of dynamically selecting it. I'm planing to document my setup and will post a link here.
Just wanted to throw in https://github.com/yevmel/squid-rpm-cache I planned to setup a dedicated squid vm and use the above mentioned config/plugin to cache rpms, but never found the time for it.
The problems/unknowns with squid are the cache miss rate, the cache disk usage in order to minimize those, and the use of different mirrors with yum (although I find out that I usually always connect to the same one).
Currently i just use my NAS which has a "normal" squid running as caching proxy. I have an ansible script which generates me my templates. In the templates I replaced the metalink
parameter with baseurl
to the nearest Fedora mirror, in /etc/yum.repos.d/fedora.repo
. In /etc/yum.conf
I replaced the proxy option with my NAS proxy and allowed TempalteVMs to connect to it.
My experience with squid is horrible in terms of resources (RAM, I/O usage) for small setups. Looks like an overkill for just downloading updates from a few templates from time to time.
I don't like saying this, but we should also consider making this an additional, non-default option or wontfix also. I like apt-cacher-ng very much and use it myself. However, introducing it by default into Qubes would lead to new issues, more users having issues with upgrading due to added technical complexity. There are corner cases where apt-cacher-ng introduces new issues, such as showing Hash Sum mismatch
errors during apt-get update
.
@marmarek
FWIW I have squid installed on an embedded router (RB450g) for a 25+ people office and it's been running for literally ages without any problem. There's a strict bandwidth control (delay pools), which is usually the biggest offender in terms of resources, but squid's memory usage has constantly been < 20 Mo and highest CPU usage < 6%. Granted, the office's uplink speed is low - in the megabits/s range - but the resources available for updateVM are in another league compared to the embedded stuff and the setup - only caching - is not fancy.
tl;dr, squid is not as bad as it used to be years ago.
@adrelanos
The issues you mention reinforce my concern that it will be too time-consuming for Qubes devs to support distro-specific solutions. A simple generic one, even if not optimal is still better than nothing at all, rather than "wontfix". Plus, users kalkin and qjoo seem to have a working solution, why not try those ?
just my 2c - not pushing for anything, you guys are doing a great work !
At the very least, we should provide some documentation (or suggestions or pointers in the documentation) regarding something like @taradiddles's router solution. Qubes users are more likely than the average Linux user to have multiple machines (in this case, virtual) downloading exactly the same updates.
Looks like what you want is Squid with an adaptive disk cache size (for storing packages in the volatile /var/cache/squid
directory), and configured with no memory cache. Since the config file can be in a different place and the unit file can be overridden to specify the Qubes specific config file, it may work very well for this purpose. Squid is goddamn good these days, and it supports regex-based filters (plus you can block methods other than GET, and you can support proxy caching FTP sites).
OTOH, it's always a security footprint issue to run a larger codebase for a cache. Also, Squid caching can be ineffective if multiple VMs download files from different mirrors (remember that the decision of which mirror to use is left practically at random to the VM calling onto the Squid proxy to do its job).
For those reasons, it may be wise to investigate solutions that do a better job of proxy caching using a content-addressable store, or matching file names.
Perhaps a custom Go-based (to prevent security vulns) cache that can listen for requests using the net/http module, and proxy them to the VMs? This has potential to be a very efficient solution too, as a Go program would have a minuscule memory footprint.
@Rudd-O Have a look at this https://github.com/mojaves/yumreproxyd
Looking. Note we need something like that for Debian as well.
The code is not idiomatic Go and there are some warts there that I would fix before including it anywhere. Just as a small example on https://github.com/mojaves/yumreproxyd/blob/master/yumreproxy/yumreproxy.go#L33 you can see he is using a nil value as a sort of a bool. That is not correct -- the return type should be (bool, struct).
https://github.com/mojaves/yumreproxyd/blob/master/yumreproxy/yumreproxy.go#L73 <- also problematic. TODO: path sanitization
is not what you want in secure software.
But the BIGGEST problem, is that the program appears not to give a shit about concurrency. Save into cache and serve from cache can have a race, and no locking is performed, nor are channels being used there. Big fat red flag. The right way to do that by communicating with the Cache aspect of the application through channels -- send request to the Cache, await for response, if not available, then download file, send storage to the Cache, await for response.
Also, all content types returned are application/rpm. That's wrong in many cases.
BUT, that only means that project can be extended or rewritten, and it should not be very difficult to do so.
I just uploaded the Squid-based https://github.com/rustybird/qubes-updates-cache (posted to qubes-devel too)
The latest commit (-57 lines, woo) reworks qubes-updates-cache to act as a drop-in replacement for qubes-updates-proxy. No changes to the client templates are needed at all now.
How much memory does it use? I.e. is it a good idea to have it instead of tinyproxy by default, or give the user a choice?
FWIW I had a similar setup running after my last post, the difference being that I used/tweaked the store_id program mentioned by @kalkin in an earlier post [1]. But there were many cache misses ; a quick look at the log showed that different mirrors would send different mime types for the same rpm (or repo) file, so that might be the culprit. Other tasks piled up and I didn't have time to work on that.
@marmarek : after boot, memory = ~30Mo (as far as you can trust ps). But I guess the question is more on the long term use, after squid has cached many objects. Rusty used 'cache_mem=0', so there shouldn't be a huge difference in mem usage, but he might have more statistics.
@rustybird : tinyproxy's configuration is quite locked down, maybe that would be a good idea to do the same with squid's ? I'm also not sure it is a good idea to mess with the cache ids for files other than rpm/repo (and deb/...).
For instance, stuff like:
acl localnet src 10.137.0.0/16
acl http_ports port 80
acl SSL_ports port 443
acl CONNECT method CONNECT
http_access deny to_localhost
http_access deny CONNECT !SSL_ports
http_access allow http_ports
http_access allow SSL_ports
http_access deny all
# that one was from https://github.com/yevmel/squid-rpm-cache
# have to understand why that's changed
#refresh_pattern . 0 20% 4320
# 3 month 12 month
refresh_pattern . 129600 33% 525600
# cache only specific files types
acl rpm_files urlpath_regex \/Packages\/.*\.rpm
acl repodata_files urlpath_regex \/repodata\/.*\.(|sqlite\.xz|xml(\.[xg]z)?)
cache allow rpm_files
cache allow repodata_files
cache deny all
@marmarek:
How much memory does it use?
With DefaultMemoryAccounting=yes
in /etc/systemd/system.conf, the following values were observed in /sys/fs/cgroup/memory/system.slice/qubes-updates-cache.service/memory.memsw.max_usage_in_bytes:
That's already the latest commit, which sets memory_pools off
in the Squid config to allow the system to reclaim unused memory. But apparently Squid doesn't free() aggressively enough yet for our purposes.
@taradiddles:
But there were many cache misses ; a quick look at the log showed that different mirrors would send different mime types for the same rpm (or repo) file, so that might be the culprit.
Yes, that seems to happen sometimes, probably because .drpm is a relatively young file extension. Is it possible to make Squid ignore the MIME type header?
tinyproxy's configuration is quite locked down, maybe that would be a good idea to do the same with squid's ?
Definitely. IIRC Whonix also wants some sort of magic string from the proxy port? Paging @adrelanos :)
I'm also not sure it is a good idea to mess with the cache ids for files other than rpm/repo (and deb/...).
So far I haven't seen the regexes in https://github.com/rustybird/qubes-updates-cache/blob/master/usr/lib/qubes/updates-cache-dedup#L6-L7 match anything else besides metadata and packages. Files aren't listed explicitly because that's such a hassle to maintain for all compression formats and package types, e.g. Debian source packages didn't work with qubes-updates-proxy when tinyproxy still used filters.
IIRC Whonix also wants some sort of magic string from the proxy port? Paging @adrelanos :)
Sorry, never mind, I literally found something with grep -r magic /etc/tinyproxy
. Will check that out.
Ivan:
tinyproxy's configuration is quite locked down, maybe that would be a good idea to do the same with squid's ?
Tinyproxy configuration was relaxed some time ago. There was a ticket and discussion. In short: locking down tinyproxy does not improve actual security. Users who explicitly configure their applications to use the updates proxy should be free to do so.
@adrelanos:
Tinyproxy configuration was relaxed some time ago. There was a ticket and discussion. In short: locking down tinyproxy does not improve actual security. Users who explicitly configure their applications to use the updates proxy should be free to do so.
There's the "Squid Manager" though, which I've restricted access to in commit https://github.com/rustybird/qubes-updates-cache/commit/0da1dcd222263af73f6b52ac6d2b5d07444474a8 -- along with a basic sanity check that requests are coming from 10.137.*.
Also a paragraph on how to use qubes-updates-cache with Whonix at the moment: https://github.com/rustybird/qubes-updates-cache/blob/3b9d5e153f89b551e9b38f82928cbc7c9c2f7ba3/README#L32-L35 (works nicely BTW, tons of cache hits across Debian / Whonix GW / Whonix WS)
I have just now finished documenting Qubes-Whonix torified updates proxy: https://www.whonix.org/wiki/Dev/Qubes#Torified_Updates_Proxy
In essence, Whonix TemplateVMs are getting the output of UWT_DEV_PASSTHROUGH="1" curl --silent --connect-timeout 10 "http://10.137.255.254:8082/"
and grep it for <meta name="application-name" content="tor proxy"/>
. If that matches, that test is considered successful.
Of course qubes-updates-cache's squid should only include the magic string, if it is actually torrified, i.e. run inside sys-whonix.
Do you know if it is possible to conditionally inject this magic string? Or if not, we need to modify Qubes-Whonix torified updates check to do something supported by squid.
I am wondering if any whonix-gw-firewall modifications will be required. Current tinyproxy rules: https://github.com/Whonix/whonix-gw-firewall/blob/724a0fc0546c83555a008cd1b7b03c048519121a/usr/bin/whonix_firewall#L310-L328
Does squid support outgoing proxy settings? Can squid be configured to use a Tor SocksPort?
Do you know if it is possible to conditionally inject this magic string? Or if not, we need to modify Qubes-Whonix torified updates check to do something supported by squid.
AFAIR in case of tinyproxy it is placed in default error page. Squid should allow the same.
Does squid support outgoing proxy settings? Can squid be configured to use a Tor SocksPort?
Haven't found anything about outgoing HTTP proxies. Semi-official Socks support can be added in during compilation via libsocks, which Debian doesn't seem to do, but ...
I am wondering if any whonix-gw-firewall modifications will be required. Current tinyproxy rules: https://github.com/Whonix/whonix-gw-firewall/blob/724a0fc0546c83555a008cd1b7b03c048519121a/usr/bin/whonix_firewall#L310-L328
... I think you'd only need to change --uid-owner tinyproxy
to --uid-owner squid
.
AFAIR in case of tinyproxy it is placed in default error page. Squid should allow the same.
Yes, the relevant file to modify is /usr/share/squid-langpack/templates/ERR_INVALID_URL from Debian package squid-langpack.
All the security implications of using qubes-updates-cache I could think of: https://github.com/rustybird/qubes-updates-cache/blob/master/README#L8
Edit: Hmm, regarding (1) it's really the same with qubes-updates-proxy. Not sure why I always (wrongly) thought of that as circuit-isolated per client VM...
Some news:
@rustybird
Made qubes-updates-cache work on Debian, incl. Whonix gateways pending PRs Whonix/whonix-gw-firewall#1 and Whonix/qubes-whonix#2
This is done btw.
I use polipo as a caching proxy between template VMs and Tor SOCKS port. It has SOCKS support and might be more lightweight than squid?
@adrelanos:
Made qubes-updates-cache work on Debian, incl. Whonix gateways pending PRs Whonix/whonix-gw-firewall#1 and Whonix/qubes-whonix#2
This is done btw.
Looks like whonix-gw-firewall needs a version bump, and qubes-whonix 5.3-1 hasn't been uploaded yet?
@qjoo:
I use polipo as a caching proxy between template VMs and Tor SOCKS port. It has SOCKS support and might be more lightweight than squid?
It doesn't seem to support deduplication or (transparent) rewriting of URLs :(
Rusty Bird:
@adrelanos:
Made qubes-updates-cache work on Debian, incl. Whonix gateways pending PRs Whonix/whonix-gw-firewall#1 and Whonix/qubes-whonix#2
This is done btw.
Looks like whonix-gw-firewall needs a version bump, and qubes-whonix 5.3-1 hasn't been uploaded yet?
Yes. The usual ETA to reach Whonix stable users is the next release, Whonix 14.
Where is the code for qubes-updates-cache.service?
Should not writing to /etc i.e. /etc/systemd/system/multi-user.target.wants/qubes-updates-cache.service better be avoided and standard distribution default systemd folders /lib/systemd/system be used?
The standard way is to create such symlink in post-installation script (preferably using presets). But since the service is controlled by qvm-service, it may be indeed good idea to provide the symlink in the package. In such a case it should live in /lib/systemd/system and be relative one.
Short update:
@marmarek:
But since the service is controlled by qvm-service, it may be indeed good idea to provide the symlink in the package. In such a case it should live in /lib/systemd/system and be relative one.
It's currently created in /etc, as if qubes-updates-cache.service was listed in https://github.com/QubesOS/qubes-core-agent-linux/blob/master/vm-systemd/75-qubes-vm.preset just like qubes-updates-proxy.service.
But I'll have to move at least the actual qubes-updates-cache.service to $(pkg-config --variable=systemdsystemunitdir systemd)
anyway, installing it to /usr/lib/systemd/system is wrong for Debian. Then the symlink could be moved there, too.
I would really like to urge folks into developing a custom cache solution using the very mature Go libraries that exist for HTTP and proxying. It will be memory-safe (no pointer bullshit), it will be far smaller than trying to shoehorn Squid into this role, and it will be trivial to provide a proper solution that will cache requested file names based on content.
@rustybird:
Looks like whonix-gw-firewall needs a version bump, and qubes-whonix 5.3-1 hasn't been uploaded yet?
@adrelanos:
I'll release a qubes-whonix package with your qubes-updates-cache changes soon. (Currently in developers repository, contains some other fixes.)
It's in Whonix jessie (stable) repository for a few days now. (And if you reinstall Qubes-Whonix 13 from qubes-tempaltes-community repository, it is also included.)
The latest qubes-updates-cache has many new rewriting rules that transparently upgrade repository URLs to HTTPS, and optionally to .onion (#2576).
Current coverage:
Repository | HTTPS | HTTP .onion |
---|---|---|
yum.Qubes | upgrade | upgrade to v3 |
deb.Qubes | upgrade | upgrade to v3 |
Whonix | upgrade | upgrade to v3 |
Debian | upgrade | upgrade to v2 |
Debian Security | upgrade | upgrade to v2 |
Fedora | upgrade | - |
RPM Fusion | upgrade | - |
Tor Project | upgrade | upgrade to v2 |
upgrade | - | |
Fedora-Cisco | uncached | - |
Adobe | - | - |
@adrelanos : I'm confused. Was qubes-updates-cache merged into Whonix 14?
@tlaurion:
Was qubes-updates-cache merged into Whonix 14?
No, installation is still manual. The Whonix PRs just made it so that if you install qubes-updates-cache in a whonix-gw VM, you don't have to fiddle with Whonix's firewall and the torification safety check.
@rustybird, are you still working on this, or are you waiting for your PoC to be reviewed?
@andrewdavidwong:
are you still working on this, or are you waiting for your PoC to be reviewed?
It's still missing the Makefiles etc. for qubes-builder integration.
Sad to say, it looks like transparent caches are becoming less viable as more and more package repository definitions on the client VM templates switch to HTTPS. Previously, they made interceptable (= cacheable) HTTP requests and qubes-updates-cache could transparently upgrade to HTTPS on cache miss. Now it's opaque and uncacheable.
I don't know what to do about that. Reverting all repository definitions to HTTP is a non-starter, because that would harm users of the non-caching qubes-updates-proxy (i.e. tinyproxy, which AFAIK can't rewrite URLs to upgrade them to HTTPS) as well as users installing software from an AppVM/StandaloneVM without any updates proxy. So there would have to be a mechanism to modify repository definitions just for qubes-updates-cache users, or alternatively some cooperative TLS MITM contraption between templates and the cache. Both sound horrible...
I have followed the first (horrible) option, using apt-cacher-ng. I rewrite the definitions in templates to http://HTTPS///repo_name format, and apt-cacher-ng then transforms this to https requests. Horrible indeed, but works. Simply implemented with one salt call.
Of course, I need a script to rewrite the definitions in TemplateBased qubes.
Community Dev: @rustybird PoC: https://github.com/rustybird/qubes-updates-cache
It's common for users to have multiple TemplateVMs that download many of the same packages when being individually updated. Caching these packages (e.g., in the UpdateVM) would allow us to download a package only once, then make it available to all the TemplateVMs which need it (and perhaps even to dom0), thereby saving bandwidth.
This has come up on the mailing lists several times over the years:
Here's a blog post about setting up a squid caching proxy for DNF updates on baremetal Fedora: