EIDA / mediatorws

EIDA NG Mediator/Federator web services
GNU General Public License v3.0
6 stars 6 forks source link

Loopback proxy & cache #75

Closed kaestli closed 4 years ago

kaestli commented 5 years ago

As the federator decomposes user requests into (typically more granular) endpoint requests following fixed rules, endpoint requests have a good probability to be repetitive, even between different user requests executed at the same (overlapping) time. Caching can be used to prevent endpoints from processing identical requests multiple times, and to prevent networks from redundant load.

The idea is: 1) rather than to the endpoint, route the federator request to a caching proxy on a local machine, This can well be the one where also the federator is running. do this by redefining the DNS of the endpoint to an alias of localhost in the local /etc/hosts

  1. have a secondary proxy step on that machine which rewrites endpoint response headers (remove potential headers coming from the endpoint and set headers for steering the caching behaviour)
  2. the secondary proxy step addresses the endpoint by IP (this step is not required if the proxy/cache machine is separate from the federator machine, or if the endpoint dns are masked in the stationlite db)

an example configuration for step 2 and 3 from federator-testing looks as follows: File /etc/apache2/sites-enabled/noa.conf Requirements: apache, mod_header, mod_proxy, mod_cache, mod_cache_http (have such a config file, and a corresponding hosts entry, for each endpoint you want to loopback proxy cache)

# The port of the header processor (see below)
# each fake endpoint has its own.
Listen 8002

<VirtualHost *:80>
  DocumentRoot "/var/www/html"
  #this configuration is listening to requests addressed to noa
  # and received on this machine due to a hosts entry on the federator machine
  ServerName eida.gein.noa.gr

  # stopping the thundering herd up to 60 sec.
  # i.e. if an identical request triggers a cache miss, this request is halted up to 30 sec. for
  # the response to appear in cache, rather than to trigger a second, parallel cache miss.
  # this prevents endpoints in case of immediately rising interest for one specific piece of data
  CacheLock on
  CacheLockPath /tmp/mod_cache-noa-lock
  CacheLockMaxAge 60

  CacheIgnoreCacheControl On
  CacheIgnoreNoLastMod    On
  CacheStoreExpired       On
  CacheStoreNoStore       On

  # for station requests...
  <LocationMatch "^/fdsnws/station/(.*)$">
        CacheEnable disk
        CacheHeader on
        CacheDetailHeader on
        CacheIgnoreNoLastMod On

        # cache them for 30 up to minutes
        # (note that caching applies only to GET requests)
        CacheDefaultExpire 1800
        CacheMaxExpire 3600
        # on cache miss, forward the request to the header processor 
        ProxyPass "http://localhost:8002/fdsnws/station/$1" timeout=30
        ProxyPassReverse "http://localhost:8002/fdsnws/station/$1"
  </LocationMatch>

  # no caching on fdsn event. forward NOA event requests to NOA
  <LocationMatch "^/fdsnws/event/(.*)$">
        ProxyPass "http://194.177.195.210/fdsnws/event/$1"
        ProxyPassReverse "http://194.177.195.210/fdsnws/event/$1"
  </LocationMatch>

  # no caching on fdsn dataselect. forward NOA dataselect requests to NOA
  # (alternatively, dataselect requests could be treated as station requests)
  <LocationMatch "^/fdsnws/dataselect/(.*)$">
        ProxyPass "http://194.177.195.210/fdsnws/dataselect/$1"
        ProxyPassReverse "http://194.177.195.210/fdsnws/dataselect/S1"
  </LocationMatch>

  # no caching on eida (currently: only eida wfcatalog) requests:
  # forward NOA wfcatalog requests to NOA
  # (alternatively, dataselect requests could be treated as station requests)
  <LocationMatch "^/eidaws/(.*)$">
        ProxyPass "http://194.177.195.210/eidaws/$1"
        ProxyPassReverse "http://194.177.195.210/eidaws/$1"
  </LocationMatch>
</VirtualHost>

<VirtualHost *:8002>
  # CacheLock on
  # CacheLockPath /tmp/mod_cache-noa-lock-8002
  # CacheLockMaxAge 30

    # Now fix the Cache-Control header..
    Header merge Cache-Control public
    # The max-age is a pain. We have to set one if it's not set, and we have to change it if it's 0
    Header merge Cache-Control "max-age=bidon"
    # Case when we have: Cache-Control max-age=.., ....
    Header edit  Cache-Control "^(.*)max-age=(.*)max-age=bidon, (.*)$" $1max-age=1800$3
    # Case when we have: Cache-Control yyy=bidon, max-age=.."
    Header edit  Cache-Control "^(.*)max-age=(.*), max-age=bidon$" $1max-age=1800
    # Now Replace the value if there was not a max-age, set to 30mn
    Header edit  Cache-Control "max-age=bidon" "max-age=1800"
    # Now Replace the value if there was a max-age=0, set to 30mn
    Header edit  Cache-Control "max-age=0" "max-age=1800"

    # Remove Cache-Control parameters potentially coming from the endpoint
    # which might prevent caching
    # (note that operations on the response are executed bottom-up)
    Header edit Cache-Control "no-cache, " ""
    Header edit Cache-Control "no-store, " ""
    Header edit Cache-Control "post-check=0, " ""
    Header edit Cache-Control "pre-check=0, " ""
    Header edit Cache-Control "must-revalidate, " ""
    Header merge Cache-Control "s-maxage=60"

    ProxyPass "/" "http://194.177.195.210/"
    ProxyPassReverse "/" "http://194.177.195.210/"

</VirtualHost>
Jollyfant commented 5 years ago

How do you map the name of eida.noa.gein.gr to particular port 8002? Don't think you can do that through /etc/hosts. Will it have to be done in the routing?

Jollyfant commented 5 years ago

I'd like to try this idea (as a Docker container) that runs a single web server and include it in this repository. Since you have experience with this approach do you think it is possible?

damb commented 5 years ago

I'd like to try this idea (as a Docker container) that runs a single web server and include it in this repository. Since you have experience with this approach do you think it is possible?

Bear in mind, that this approach only works if eida-federator requests data from endpoints via HTTP GET. Currently, requests are exclusively using HTTP POST. So code changes are inevitable. In addition, we could apply an intelligent merging strategy in order to issue bulk GET requests.

Jollyfant commented 5 years ago

Bear in mind, that this approach only works if eida-federator requests data from endpoints via HTTP GET. Currently, requests are exclusively using HTTP POST. So code changes are inevitable. In addition, we could apply an intelligent merging strategy in order to issue bulk GET requests.

No worries, I won't be able to implement this today anyway ;). Our goal was end June!

kaestli commented 5 years ago

How do you map the name of eida.noa.gein.gr to particular port 8002? Don't think you can do that through /etc/hosts. Will it have to be done in the routing?

No. All traffic from the federator All eida.noa.gein.gr is host-routed to the loopback cache proxy. the cache step forwards cache misses to the proxy and header processor step under a different port. this one forwards to the "true" eida.noa.gein.gr loopback-cache

Jollyfant commented 5 years ago

I started on #78. But I'm gonna need some points on how to implement this.

damb commented 4 years ago

Closed with #85.