EIDA / mediatorws

EIDA NG Mediator/Federator web services
GNU General Public License v3.0
6 stars 6 forks source link

Cache /fdsnws/station requests #92

Closed damb closed 4 years ago

damb commented 4 years ago

Features and Changes:


Disadvantages of this approach:

curl -v "http://localhost:8080/fdsnws/station/1/query?net=CH&sta=*&level=station&start=2019-01-01&format=xml"

or even

curl -v "http://localhost:8080/fdsnws/station/1/query?net=CH,GR&sta=*&level=station&start=2019-01-01&format=xml"

and

curl -v "http://localhost:8080/fdsnws/station/1/query?net=GR,CH&sta=*&level=station&start=2019-01-01&format=xml"

can force cache misses. Also aliases are not taken into consideration (e.g. queries with net and network are treated differently).

As a consequence of the disadvantages listed above, the application IMO should handle the cache internally. Note, that the current docker production setup comes along with a redis server anyway which could be used for this purpose.

damb commented 4 years ago

CC @kaestli

damb commented 4 years ago

@kaestli, I deployed the feature at mediator-devel.ethz.ch. If you'd like you can give it a try.

damb commented 4 years ago

References: #50

kaestli commented 4 years ago

observation:

nonix:~$ date; curl 'http://mediator-devel.ethz.ch/fdsnws/station/1/query?network=*&station=*&location=*&channel=HHZ,HHE&start=2019-03-01&end=2019-03-03&level=response&format=xml' > /tmp/bla.xml; date
Thu Dec 12 14:07:48 CET 2019
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100 58.9M    0 58.9M    0     0   177k      0 --:--:--  0:05:40 --:--:-- 1984k
Thu Dec 12 14:13:28 CET 2019
nonix:~$ date; curl 'http://mediator-devel.ethz.ch/fdsnws/station/1/query?network=*&station=*&location=*&channel=HHZ,HHE&start=2019-03-01&end=2019-03-03&level=response&format=xml' > /tmp/bla.xml; date
Thu Dec 12 14:14:50 CET 2019
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100 58.9M    0 58.9M    0     0   487k      0 --:--:--  0:02:03 --:--:-- 2824k
Thu Dec 12 14:16:54 CET 2019
nonix:

from response times of an identically repeated request I guess that backend caching is working, but frontend cache is not.

Note: special care is required to avoid multiple cache versions (in the frontend cache) for different sets of request headers - ask @cbonjour for details. (in this case, I would recommend to disregard even accept-encoding, and return all data uncompressed (no mod_deflate) (as station information is little, wfcatalog is rare, and dataselect is precompressed)

damb commented 4 years ago

Hmm. I restarted the Frontend - Apache from (mediator-devel.ethz.ch) and now it's working, again. This is weird. Apparently, the configuration is not stable, yet.

First request:

$ time curl -v -o /dev/null 'http://mediator-devel.ethz.ch/fdsnws/station/1/query?network=*&station=*&location=*&channel=HHZ,HHE&start=2019-03-01&end=2019-03-03&level=station&format=text'
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
  0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0*   Trying 129.132.144.211...
* TCP_NODELAY set
* Connected to mediator-devel.ethz.ch (129.132.144.211) port 80 (#0)
> GET /fdsnws/station/1/query?network=*&station=*&location=*&channel=HHZ,HHE&start=2019-03-01&end=2019-03-03&level=station&format=text HTTP/1.1
> Host: mediator-devel.ethz.ch
> User-Agent: curl/7.58.0
> Accept: */*
> 
  0     0    0     0    0     0      0      0 --:--:--  0:00:24 --:--:--     0< HTTP/1.1 200 OK
< Date: Thu, 12 Dec 2019 15:33:17 GMT
< Server: Apache/2.4.18 (Ubuntu)
< Cache-Control: public, max-age=43200
< Access-Control-Allow-Origin: *
< Vary: Accept-Encoding
< X-Cache: MISS from localhost
< X-Cache-Detail: "cache miss: attempting entity save" from localhost
< Transfer-Encoding: chunked
< Content-Type: text/plain; charset=utf-8
< 
{ [342 bytes data]
100  125k    0  125k    0     0   2393      0 --:--:--  0:00:53 --:--:--  5032
* Connection #0 to host mediator-devel.ethz.ch left intact

real    0m53.650s
user    0m0.044s
sys 0m0.043s

Second request:

$ time curl -v -o /dev/null 'http://mediator-devel.ethz.ch/fdsnws/station/1/query?network=*&station=*&location=*&channel=HHZ,HHE&start=2019-03-01&end=2019-03-03&level=station&format=text'
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
  0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0*   Trying 129.132.144.211...
* TCP_NODELAY set
* Connected to mediator-devel.ethz.ch (129.132.144.211) port 80 (#0)
> GET /fdsnws/station/1/query?network=*&station=*&location=*&channel=HHZ,HHE&start=2019-03-01&end=2019-03-03&level=station&format=text HTTP/1.1
> Host: mediator-devel.ethz.ch
> User-Agent: curl/7.58.0
> Accept: */*
> 
< HTTP/1.1 200 OK
< Date: Thu, 12 Dec 2019 15:34:51 GMT
< Server: Apache/2.4.18 (Ubuntu)
< Vary: Accept-Encoding
< Cache-Control: public, max-age=43200
< Access-Control-Allow-Origin: *
< Age: 93
< X-Cache: HIT from localhost
< X-Cache-Detail: "cache hit" from localhost
< Content-Length: 128332
< Content-Type: text/plain; charset=utf-8
< 
{ [14152 bytes data]
100  125k  100  125k    0     0  9640k      0 --:--:-- --:--:-- --:--:-- 9640k
* Connection #0 to host mediator-devel.ethz.ch left intact

real    0m0.040s
user    0m0.015s
sys 0m0.018s

However, due to the disadvantages mentioned above I still favor a distributed cache handled by the WSGI application itself. @cbonjour shares the same view after discussing the issue.

kaestli commented 4 years ago

However, due to the disadvantages mentioned above I still favor a distributed cache handled by the WSGI application itself. @cbonjour shares the same view after discussing the issue.

i disagree on this. we can discuss tomorrow...

damb commented 4 years ago

eida-federator is implemented such that endpoint requests to DCs are not executed anymore if a client terminates the connection while streaming the response. This fact leads to an interesting behaviour when trying to cache by means of Apache2's mod_cache.

Assuming a client issues the request:

$ curl -v -o - "http://mediator-devel.ethz.ch/fdsnws/station/1/query?net=CH,GR,AW&format=xml"

but terminates the connection right after the net=GR was served (the <Network></Network> tags for net=CH and net=AW are still missing). The headers (HTTP code 200) are gone since the service is able to serve a valid response, however, the content was not served completely, yet. Also, mod_cache is not aware of the full scenario. Though, when executing the request from above a second time, the request turns out to lead to a cache hit and the data already served during the first go is returned again. However, in case of format=xml the cached content consequently does not agree with StationXML1.0.

damb commented 4 years ago

Closed due to the unpredictable behaviour mentioned before.