Open vjrj opened 5 years ago
Won't do before migration to the new infrastructure.
Probably related.
During ingestion we have sometimes also 500 errors accessing to the collectory API:
biocache-store-0 2020-04-22 09:27:22,089 WARN : [WebServiceLoader] - Unable to load https://colecciones.gbif.es/ws/dataResource/dr296.json : Server returned HTTP response code: 500 for URL: https://colecciones.gbif.es/ws/dataResource/dr296.json
In the proxy:
172.16.16.15 - - [21/Apr/2020:08:53:31 +0200] "POST /ws/dataResource/dr296 HTTP/1.1" 200 31 "-" "Apache-HttpClient/4.5.6 (Java/1.8.0_242)" "-" request_time=0.015 upstream_response_time=0.016 upstream_connect_time=0.000 upstream_header_time=0.016 upstream_cache_status=-
172.16.16.15 - - [21/Apr/2020:08:53:31 +0200] "GET /ws/dataResource/dr296.json HTTP/1.1" 200 3429 "-" "Java/1.8.0_242" "-" request_time=0.086 upstream_response_time=0.084 upstream_connect_time=0.000 upstream_header_time=0.084 upstream_cache_status=-
172.16.16.15 - - [21/Apr/2020:08:53:31 +0200] "GET /ws/dataResource/dr296.json HTTP/1.1" 200 3429 "-" "Java/1.8.0_242" "-" request_time=0.092 upstream_response_time=0.092 upstream_connect_time=0.000 upstream_header_time=0.088 upstream_cache_status=-
172.16.16.15 - - [21/Apr/2020:08:53:31 +0200] "GET /ws/dataResource/dr296.json HTTP/1.1" 500 23231 "-" "Java/1.8.0_242" "-" request_time=0.132 upstream_response_time=0.132 upstream_connect_time=0.000 upstream_header_time=0.132 upstream_cache_status=-
And in tomcat:
Caused by: com.mysql.jdbc.exceptions.jdbc4.CommunicationsException: Communications link failure
Testing with artillery
I get some interesting output:
$ artillery quick --count 10 -n 20 https://colecciones.gbif.es/ws/dataResource/dr296.json
Started phase 0, duration: 1s @ 09:42:19(+0200) 2020-04-22
Report @ 09:42:23(+0200) 2020-04-22
Elapsed time: 4 seconds
Scenarios launched: 10
Scenarios completed: 10
Requests completed: 200
Mean response/sec: 52.22
Response time (msec):
min: 53.4
max: 784.7
median: 106.4
p95: 222.3
p99: 356.8
Codes:
200: 199
500: 1
Only on 500 error. If I continue testing with thousands of petitions I don't get this error anymore:
Summary report @ 09:50:50(+0200) 2020-04-22
Scenarios launched: 10
Scenarios completed: 10
Requests completed: 2000
Mean response/sec: 69.2
Response time (msec):
min: 53.7
max: 475.8
median: 110.8
p95: 176.4
p99: 292.5
Scenario counts:
0: 10 (100%)
Codes:
200: 2000
This sounds to me something like a mysql reconnection issue.
Maybe this can be used to get more info about these errors https://stackoverflow.com/a/33589241/642847 and https://stackoverflow.com/questions/5491065/how-to-know-from-where-was-thrown-error-500-grails
On new collectory deployments, when try to access initially to access the index, I get:
I can see a 500 error in js console and in nginx:
In
catalina.out
I only see:Refreshing the page, the error goes away. It sounds like a lazy initialization that fails.
Sites affected & versions:
v1.5.4
v1.5.1
v1.5.1
Although we have
collectory_version = LATEST
in all of our inventories and this was recently deployed.cc @geonb @jloomisVCE