clarin-eric / component-registry-rest

Component Registry back end
GNU General Public License v3.0
2 stars 1 forks source link

Look into optimising/caching of profile xml generation result #5

Open twagoo opened 8 years ago

twagoo commented 8 years ago

The creation of profile XMLs can take a rather long time (over 10 seconds) for larger (once expanded) profiles. E.g. requesting the xsd of clarin.eu:cr1:p_1361876010571 took 12 seconds on this occasion:

$ time wget "http://catalog.clarin.eu/ds/ComponentRegistry/rest/registry/profiles/clarin.eu:cr1:p_1361876010571/xsd"
--2016-07-07 17:06:31--  http://catalog.clarin.eu/ds/ComponentRegistry/rest/registry/profiles/clarin.eu:cr1:p_1361876010571/xsd
Resolving catalog.clarin.eu... 147.251.9.199
Connecting to catalog.clarin.eu|147.251.9.199|:80... connected.
HTTP request sent, awaiting response... 200 OK
Length: unspecified [application/x-download]
Saving to: 'xsd'

xsd                  [  <=>                 ] 642.27K  1.79MB/s   in 0.4s   

2016-07-07 17:06:44 (1.79 MB/s) - 'xsd' saved [657681]

0.00user 0.01system 0:12.85elapsed 0%CPU (0avgtext+0avgdata 4636672maxresident)k
0inputs+3outputs (0major+375minor)pagefaults 0swaps

See if this can be improved either by caching or other means of streamlining the process. A first step would be to determine the bottleneck. The following do not seem to be the bottleneck:

Actually, retrieving the XML rather than the XSD seems to take longer.

twagoo commented 8 years ago

Suggestion for a caching solution to look into from Willem: https://varnish-cache.org/

twagoo commented 7 years ago

Suggestion - with relative little effort probably quite an improvement could be achieved by the following naive caching approach: cache all responses to GET requests but invalidate the entire cache on any POST, PUT or DELETE. Possibly some exceptions can be defined for actual static content.

twagoo commented 7 years ago

Another framework that could be useful to set up a simple caching solution: Redis

twagoo commented 6 years ago

A combination of an nginx proxy (to be defined in the compose setup) and some clever handling of cache related request headers seems like a promising path to explore.

See https://www.nginx.com/blog/nginx-caching-guide/