AtlasOfLivingAustralia / collectory-plugin

A plugin for metadata registry functionality of the ALA
https://collections.ala.org.au
2 stars 25 forks source link

Temporally error 500 #155

Open vjrj opened 5 years ago

vjrj commented 5 years ago

On new collectory deployments, when try to access initially to access the index, I get:

An error has occurred
 - Error: unknown

I can see a 500 error in js console and in nginx:

10.10.10.100 - - [27/Aug/2019:08:48:07 +0000] "GET / HTTP/1.0" 500 237 "-" "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/76.0.3809.100 Safari/537.36" "185.154.210.60" request_time=0.248 upstream_response_time=0.248 upstream_connect_time=0.000 upstream_header_time=0.248 upstream_cache_status=-

In catalina.out I only see:

ago 27, 2019 8:48:07 AM org.apache.catalina.core.ApplicationContext log
INFORMACIÓN: Initializing Spring FrameworkServlet 'gsp'
ago 27, 2019 8:48:07 AM org.apache.catalina.core.ApplicationContext log
INFORMACIÓN: GSP servlet initialized

Refreshing the page, the error goes away. It sounds like a lazy initialization that fails.

Sites affected & versions:

Although we have collectory_version = LATEST in all of our inventories and this was recently deployed.

cc @geonb @jloomisVCE

RobinaSanderson commented 4 years ago

Won't do before migration to the new infrastructure.

vjrj commented 4 years ago

Probably related.

During ingestion we have sometimes also 500 errors accessing to the collectory API:

biocache-store-0 2020-04-22 09:27:22,089 WARN : [WebServiceLoader] - Unable to load https://colecciones.gbif.es/ws/dataResource/dr296.json : Server returned HTTP response code: 500 for URL: https://colecciones.gbif.es/ws/dataResource/dr296.json

In the proxy:

172.16.16.15 - - [21/Apr/2020:08:53:31 +0200] "POST /ws/dataResource/dr296 HTTP/1.1" 200 31 "-" "Apache-HttpClient/4.5.6 (Java/1.8.0_242)" "-" request_time=0.015 upstream_response_time=0.016 upstream_connect_time=0.000 upstream_header_time=0.016 upstream_cache_status=-
172.16.16.15 - - [21/Apr/2020:08:53:31 +0200] "GET /ws/dataResource/dr296.json HTTP/1.1" 200 3429 "-" "Java/1.8.0_242" "-" request_time=0.086 upstream_response_time=0.084 upstream_connect_time=0.000 upstream_header_time=0.084 upstream_cache_status=-
172.16.16.15 - - [21/Apr/2020:08:53:31 +0200] "GET /ws/dataResource/dr296.json HTTP/1.1" 200 3429 "-" "Java/1.8.0_242" "-" request_time=0.092 upstream_response_time=0.092 upstream_connect_time=0.000 upstream_header_time=0.088 upstream_cache_status=-
172.16.16.15 - - [21/Apr/2020:08:53:31 +0200] "GET /ws/dataResource/dr296.json HTTP/1.1" 500 23231 "-" "Java/1.8.0_242" "-" request_time=0.132 upstream_response_time=0.132 upstream_connect_time=0.000 upstream_header_time=0.132 upstream_cache_status=-

And in tomcat:

Caused by: com.mysql.jdbc.exceptions.jdbc4.CommunicationsException: Communications link failure

Testing with artillery I get some interesting output:

$ artillery quick --count 10 -n 20  https://colecciones.gbif.es/ws/dataResource/dr296.json 
Started phase 0, duration: 1s @ 09:42:19(+0200) 2020-04-22
Report @ 09:42:23(+0200) 2020-04-22
Elapsed time: 4 seconds
  Scenarios launched:  10
  Scenarios completed: 10
  Requests completed:  200
  Mean response/sec: 52.22
  Response time (msec):
    min: 53.4
    max: 784.7
    median: 106.4
    p95: 222.3
    p99: 356.8
  Codes:
    200: 199
    500: 1

Only on 500 error. If I continue testing with thousands of petitions I don't get this error anymore:

Summary report @ 09:50:50(+0200) 2020-04-22
  Scenarios launched:  10
  Scenarios completed: 10
  Requests completed:  2000
  Mean response/sec: 69.2
  Response time (msec):
    min: 53.7
    max: 475.8
    median: 110.8
    p95: 176.4
    p99: 292.5
  Scenario counts:
    0: 10 (100%)
  Codes:
    200: 2000

This sounds to me something like a mysql reconnection issue.

vjrj commented 2 years ago

Maybe this can be used to get more info about these errors https://stackoverflow.com/a/33589241/642847 and https://stackoverflow.com/questions/5491065/how-to-know-from-where-was-thrown-error-500-grails