Closed rptaylor closed 2 years ago
That is something we have considered, but so far have not been willing to go there. The idea of not providing a default for CVMFS_HTTP_PROXY is to force some sort of deliberate choice for proxy, because we don't want large installations to ignore that step and go directly to the CDN. If you look at the same section in the default branch you'll see that Jakob was not even willing to go as far as what you see there in master (egi and osg are like master), in that he doesn't enable CVMFS_USE_CDN if CVMFS_HTTP_PROXY is DIRECT or auto;DIRECT. What they do all have in common is to set a default proxy of auto;DIRECT if the user sets CVMFS_CLIENT_PROFILE=single.
I see. Though omitting a value for CVMFS_HTTP_PROXY is already allowed if "$CVMFS_CLIENT_PROFILE" = "single".
Anyway in that case perhaps I would do something similar in the CC config file, affecting only CC repos:
if [ -z "$CVMFS_HTTP_PROXY" ]; then
CVMFS_HTTP_PROXY=DIRECT
fi
if [ "$CVMFS_USE_CDN" = "yes" ] || [ "$CVMFS_HTTP_PROXY" = "DIRECT" ] || [ "$CVMFS_HTTP_PROXY" = "auto;DIRECT" ]; then
CVMFS_SERVER_URL="http://cvmfs-s1.computecanada.net/cvmfs/@fqrn@"
else
CVMFS_SERVER_URL="http://cvmfs-s1-arbutus.computecanada.ca:8000/cvmfs/@fqrn@;http://cvmfs-s1-beluga.computecanada.ca:8000/cvmfs/@fqrn@;http://cvmfs-s1-east.computecanada.ca:8000/cvmfs/@fqrn@"
fi
We generally don't want direct (unproxied) hits from clients to our stratum servers. Our SLA/ToS, as it were, is that anyone can have access via CDN with no strings attached, but if you want the potential performance improvement of proxied access to s1 servers (which we especially recommend for clusters) we require a proxy to be used.
I think that's a reasonable compromise if Jakob agrees. If we port it to the EGI & OSG config repos as well we shouldn't need the extra checks for enabling the CDN because those are already the default there (if we put the setting of CVMFS_HTTP_PROXY before invoking common.conf).
I am a little surprised that you've got only one CDN alias. How do you configure Cloudflare to choose from the stratum 1s? You do probably have a different situation than we do because you control all the stratum 1s and can make sure they hold the same repositories. However, I suspect you may be subject to the problem of different rates of updates: if a catalog is read from one stratum 1 that had an early update, it might cause a client to read from another stratum 1 that doesn't yet have a corresponding new file. This is why we never put two independent stratum 1s in a round-robin.
The config change for the CC domain looks good to me.
We use a loadbalancer with > 3 servers behind it, and dynamic steering is used to determine the closest origin server. Since the dynamic steering is RTT-based clients in a given location would generally get redirected to the same server pool, so they should get a fairly consistent view of the repositories, unlike in a round robin LB. ( Although the steering is updated over time with a EWMA so it can gradually shift if network conditions change, but I don't think that would be an issue on the same time scale as catalog updates). Nevertheless we monitor the servers to make sure they are all updating in a timely manner.
Also if an origin server is unavailable, the zero-downtime feature dynamically redirects a client to another origin within the same HTTP request to avoid serving an error to clients, until the next health check runs to update the load balancer with the available origins.
Yes, using such a load balancer makes you subject to the problem I described. I recommend against it. Load balancers assume all servers are identical and don't have state that might vary between them. A single client is probably not going to have a problem because the dynamic steering will always direct it to the same server, but if a second client comes along on the other side of the country and it reads a cached catalog from server A it's going to expect that all files in the catalog (including ones that aren't yet cached) are available even though it is reading from server B which might not yet have the updates associated with the cached catalog there.
Let the cvmfs client geo sorting take care of it for you because then the client knows which server it is talking to and always makes sure to use catalogs and data from the same server. The geo sorting also works with cloudflare, if you create additional aliases with the same name prefixed with "ip." that are not proxied by Cloudflare, so the geo sorting can find the origin server's location.
A single client is probably not going to have a problem because the dynamic steering will always direct it to the same server, but if a second client comes along on the other side of the country and it reads a cached catalog from server A it's going to expect that all files in the catalog (including ones that aren't yet cached) are available even though it is reading from server B which might not yet have the updates associated with the cached catalog there.
Yes, but a second client on the other side of the country would be directed to the nearest Cloudflare data center to it (via anycast DNS) , and then redirected to an origin server based on the RTT between the region of that Cloudflare data center and the origin server pool. I think the problem you describe would only happen if there were a globally shared Cloudflare cache, but each Cloudflare region has an independent cache so if a client sees a new catalog from server A , it will also see the same data content from server A, not B (given that servers A and B are in different regions in this example). As far as I know Cloudflare does not pre-load or replicate cached data between regions (though I think all data centers in the same Cloudflare region might share the same cache).
CVMFS and Cloudflare both have a good approach to distributing content but they are indeed rather different and we have to be careful when using them together. If I am mistaken about the details I will take a closer look. There may be improvements we could make, and I will think about it more.
If every stratum 1 is in a different Cloudflare region and the caches aren't shared than you might be right. According to maxmind and db-ip.com, arbutus and beluga are both in Montreal, although sometimes IP addresses aren't really located where they're registered.
In any case I suggest installing frontier-awstats on the stratum 1s so the WLCG squid operations team can monitor them for failovers. That would also quickly show from the awstats plots whether or not the same Cloudflare IP addresses are being used to contact more than one. Or you could do a manual analysis of the logs, looking for IP addresses in the Cloudflare IP ranges.
addressed in https://github.com/cvmfs-contrib/config-repo/pull/95
Hello,
Regarding https://github.com/cvmfs-contrib/config-repo/blob/master/etc/cvmfs/common.conf#L15 As I understand the intention is to ensure a CDN is used, if the use of a CDN had not been previously defined, and no specific proxies are defined.
Before we discussed the CVMFS_USE_CDN variable there was not really a standard approach for this, so we (CC) came up with our own approach to make things simple and convenient for our end users (some of whom may not know much about proxies). We allowed CVMFS_HTTP_PROXY to remain unset in the client's configuration, and if so, used that as a flag to select use of the CDN (and DIRECT). That way it "just works" out of the box (using CDN) without having to worry about choosing or understanding proxies, while also allowing more advanced users who have proxies available to define and use them (in which case they also get the "real" s1 servers).
I wonder if we could expand this , something like:
I think this would more broadly fulfill the same intention. If both CVMFS_HTTP_PROXY and CVMFS_USE_CDN are unset, then use DIRECT and CDN. It would only affect users with minimal basic config, they would not need to know the details about what CDNs and proxies are and it "just works" for them without having to understand those settings and apply them. It would allow certain of our end users to transition from the CC config repo to the default config repo (another simplification for them allowing it to more easily work out of the box), and no other users would be affected (since everyone else would have already been required to set a valid CVMFS_HTTP_PROXY).