Closed damb closed 4 years ago
CC @kaestli
@kaestli, I deployed the feature at mediator-devel.ethz.ch
. If you'd like you can give it a try.
References: #50
observation:
nonix:~$ date; curl 'http://mediator-devel.ethz.ch/fdsnws/station/1/query?network=*&station=*&location=*&channel=HHZ,HHE&start=2019-03-01&end=2019-03-03&level=response&format=xml' > /tmp/bla.xml; date
Thu Dec 12 14:07:48 CET 2019
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
100 58.9M 0 58.9M 0 0 177k 0 --:--:-- 0:05:40 --:--:-- 1984k
Thu Dec 12 14:13:28 CET 2019
nonix:~$ date; curl 'http://mediator-devel.ethz.ch/fdsnws/station/1/query?network=*&station=*&location=*&channel=HHZ,HHE&start=2019-03-01&end=2019-03-03&level=response&format=xml' > /tmp/bla.xml; date
Thu Dec 12 14:14:50 CET 2019
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
100 58.9M 0 58.9M 0 0 487k 0 --:--:-- 0:02:03 --:--:-- 2824k
Thu Dec 12 14:16:54 CET 2019
nonix:
from response times of an identically repeated request I guess that backend caching is working, but frontend cache is not.
Note: special care is required to avoid multiple cache versions (in the frontend cache) for different sets of request headers - ask @cbonjour for details. (in this case, I would recommend to disregard even accept-encoding, and return all data uncompressed (no mod_deflate) (as station information is little, wfcatalog is rare, and dataselect is precompressed)
Hmm. I restarted the Frontend - Apache from (mediator-devel.ethz.ch
) and now it's working, again. This is weird. Apparently, the configuration is not stable, yet.
First request:
$ time curl -v -o /dev/null 'http://mediator-devel.ethz.ch/fdsnws/station/1/query?network=*&station=*&location=*&channel=HHZ,HHE&start=2019-03-01&end=2019-03-03&level=station&format=text'
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
0 0 0 0 0 0 0 0 --:--:-- --:--:-- --:--:-- 0* Trying 129.132.144.211...
* TCP_NODELAY set
* Connected to mediator-devel.ethz.ch (129.132.144.211) port 80 (#0)
> GET /fdsnws/station/1/query?network=*&station=*&location=*&channel=HHZ,HHE&start=2019-03-01&end=2019-03-03&level=station&format=text HTTP/1.1
> Host: mediator-devel.ethz.ch
> User-Agent: curl/7.58.0
> Accept: */*
>
0 0 0 0 0 0 0 0 --:--:-- 0:00:24 --:--:-- 0< HTTP/1.1 200 OK
< Date: Thu, 12 Dec 2019 15:33:17 GMT
< Server: Apache/2.4.18 (Ubuntu)
< Cache-Control: public, max-age=43200
< Access-Control-Allow-Origin: *
< Vary: Accept-Encoding
< X-Cache: MISS from localhost
< X-Cache-Detail: "cache miss: attempting entity save" from localhost
< Transfer-Encoding: chunked
< Content-Type: text/plain; charset=utf-8
<
{ [342 bytes data]
100 125k 0 125k 0 0 2393 0 --:--:-- 0:00:53 --:--:-- 5032
* Connection #0 to host mediator-devel.ethz.ch left intact
real 0m53.650s
user 0m0.044s
sys 0m0.043s
Second request:
$ time curl -v -o /dev/null 'http://mediator-devel.ethz.ch/fdsnws/station/1/query?network=*&station=*&location=*&channel=HHZ,HHE&start=2019-03-01&end=2019-03-03&level=station&format=text'
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
0 0 0 0 0 0 0 0 --:--:-- --:--:-- --:--:-- 0* Trying 129.132.144.211...
* TCP_NODELAY set
* Connected to mediator-devel.ethz.ch (129.132.144.211) port 80 (#0)
> GET /fdsnws/station/1/query?network=*&station=*&location=*&channel=HHZ,HHE&start=2019-03-01&end=2019-03-03&level=station&format=text HTTP/1.1
> Host: mediator-devel.ethz.ch
> User-Agent: curl/7.58.0
> Accept: */*
>
< HTTP/1.1 200 OK
< Date: Thu, 12 Dec 2019 15:34:51 GMT
< Server: Apache/2.4.18 (Ubuntu)
< Vary: Accept-Encoding
< Cache-Control: public, max-age=43200
< Access-Control-Allow-Origin: *
< Age: 93
< X-Cache: HIT from localhost
< X-Cache-Detail: "cache hit" from localhost
< Content-Length: 128332
< Content-Type: text/plain; charset=utf-8
<
{ [14152 bytes data]
100 125k 100 125k 0 0 9640k 0 --:--:-- --:--:-- --:--:-- 9640k
* Connection #0 to host mediator-devel.ethz.ch left intact
real 0m0.040s
user 0m0.015s
sys 0m0.018s
However, due to the disadvantages mentioned above I still favor a distributed cache handled by the WSGI application itself. @cbonjour shares the same view after discussing the issue.
However, due to the disadvantages mentioned above I still favor a distributed cache handled by the WSGI application itself. @cbonjour shares the same view after discussing the issue.
i disagree on this. we can discuss tomorrow...
eida-federator
is implemented such that endpoint requests to DCs are not executed anymore if a client terminates the connection while streaming the response. This fact leads to an interesting behaviour when trying to cache by means of Apache2's mod_cache
.
Assuming a client issues the request:
$ curl -v -o - "http://mediator-devel.ethz.ch/fdsnws/station/1/query?net=CH,GR,AW&format=xml"
but terminates the connection right after the net=GR
was served (the <Network></Network>
tags for net=CH
and net=AW
are still missing).
The headers (HTTP code 200) are gone since the service is able to serve a valid response, however, the content was not served completely, yet. Also, mod_cache
is not aware of the full scenario.
Though, when executing the request from above a second time, the request turns out to lead to a cache hit and the data already served during the first go is returned again. However, in case of format=xml
the cached content consequently does not agree with StationXML1.0.
Closed due to the unpredictable behaviour mentioned before.
Features and Changes:
mod_cache_disk
in order to cache/fdsnws/station
HTTP GET requestseida-federator
WSGI applicationDisadvantages of this approach:
and
or even
and
can force cache misses. Also aliases are not taken into consideration (e.g. queries with
net
andnetwork
are treated differently).As a consequence of the disadvantages listed above, the application IMO should handle the cache internally. Note, that the current docker production setup comes along with a redis server anyway which could be used for this purpose.