EESSI / filesystem-layer

Filesystem layer of the EESSI project
https://eessi.github.io/docs/filesystem_layer
GNU General Public License v2.0
7 stars 16 forks source link

Use Varnish cache instead of Squid proxy #194

Open jpecar opened 3 months ago

jpecar commented 3 months ago

Due to squid's age there's a strong motivation to use something more modern and faster. Varnish is a reverse proxy, typically used as a "web application accelerator" but can be used with eessi scenario due to how cvmfs works.

Attached is a mashup of some existing cvmfs+varnish configs I found online (mainly caching logic) with addition of eessi backends and some logic to use them when url looks like it wants them. It is showing signs of life and already offers lower loading latencies on initial cold cache but can of course be polished further.

eessi.vcl.txt

amadio commented 3 months ago

You may also want to look into using XRootD for your cache (Xcache). Documentation can be found here: https://xrootd.slac.stanford.edu/doc/dev56/pss_config.htm

boegel commented 3 months ago

@jpecar It could be interesting to re-run the performance experiments I did with TensorFlow startup time for the tutorial, see https://multixscale.github.io/cvmfs-tutorial-hpc-best-practices/performance/, to check how much the (mainly cold) startup times improve with Varnish vs Squid proxy.

jpecar commented 2 months ago

Gathered some numbers:

reference: local eb repo on NFS: first load, cold vfs cache: 31.61 warm vfs cache: 6.16 -0.47 +0.59

private stratum1, direct: first load on cold cache: 10.01 subsequent warm cache loads: 9.27 -0.14 +0.34

private stratum1 + squid, cold cache: first load on cold cache: 11.14 subsequent warm cache loads: 9.10 -0.23 +0.26

private stratum1 + varnish, cold cache: first load on cold cache: 11.11 subsequent warm cache loads: 9.38 -0.37 +0.50

eessi from aws + squid: first load on cold cache: 24.25 subsequent loads: 9.09 -0.15 +0.24

eessi from aws + varnish: first load on cold cache: 89.17 subsequent loads: 9.39 -0.26 +0.27

Interestingly initial fetch from aws s1 on varnish takes about 3x longer. One needs to decide whether that's acceptable for their local env or not.

Our prod env is now made of private S1 with two varnish instances (in-memory only), configured with fallback director to prefer local S1.

ocaisa commented 2 months ago

Probably what is more interesting to look at is what the performance looks like if we are hammering squid/varnish (imagine an 8k process MPI job). From what I read varnish should be better at handling lots of simultaneous requests (which means less instances required per site)

jpecar commented 2 months ago

Yes. In our env I basically cannot avoid things like R/Bioconductor in tight loops of tens of thousands invocations ... and those have brought our nfs filers to their knees in the past.