epimorphics / data-API

data API design and implementation
0 stars 0 forks source link

DS API retry connection #49

Open andrew-pickin-epi opened 4 years ago

andrew-pickin-epi commented 4 years ago

Ref: https://epimorphics.codebasehq.com/projects/operations/tickets/414

INFO: At least one JAR was scanned for TLDs yet contained no TLDs. Enable debug logging for this logger for a complete list of JARs that were scanned but no TLDs were found in them. Skipping unneeded JARs during scanning can improve startup time and JSP compilation time.
2020-02-19 09:30:23,463 || INFO  DSAPIManager         :: DSAPI 1.1.4r, flatten query log entry: yes, validating terms against types: no.
2020-02-19 09:30:23,588 || INFO  VelocityRender       :: Loaded config: /var/lib/tomcat7/webapps/dsapi/WEB-INF/templates/velocity.properties
2020-02-19 09:30:24,045 || ERROR DatasetMonitor       :: Failed to load config file: /etc/dsapi/conf/ppd.ttl - HttpException: 503 HTTP 503 error making the query: Service Unavailable
2020-02-19 09:30:24,314 || INFO  ConfigMonitor        :: Adding monitored entry ukhpi from: /etc/dsapi/conf/ukhpi.ttl
2020-02-19 09:30:24,316 || INFO  AppConfig            :: Loaded App app as the default app

In the above trace ppd endpoint is not made available as the apache proxy fronting fuseki has marked the serve down and unavailable for 60s.

The log should make a distinction between the service being misconfigured, as this entry suggests. and the service being unavailble.

Secondly, as this is a potentially transient environment issue this should be subject to reties.

der commented 4 years ago

The message is not that unreasonable. During load it is pulling the configuration (the DSD) from the Sparql endpoint, if that's not there it can't configure anything. Would be easy enough to catch that sort of exception and generate a different error message if that's worth it.

A retry would make sense and would be possible but would have to be within the DatasetMonitor, can just leave it unconfigured until there is a query. Which means there would need to be some bounds to the retry. Can't remember if the underlying ConfigMonitor is threaded or will block other config operations until it clears but certainly possible it'll block.

Is there any evidence for what a sensible retry limit is? As a default suggest a retry limit of 3 x the connection timeout.

andrew-pickin-epi commented 4 years ago

In the instance in question a the worked had been flagged as off-line for 60s (apache default). This is no longer the case.
In the event of potentially environmental issues sort as this the issue is not have many tries, as I'd suggest it should be indefinitely. The question is how frequently? I'd suggest once or twice a minute.