AtlasOfLivingAustralia / logger-service

Atlas event logging
https://logger.ala.org.au
1 stars 8 forks source link

Queue up log requests and retry any failed database inserts #15

Open ansell opened 6 years ago

ansell commented 6 years ago

The logger service is becoming overwhelmed by requests at some points. Failed requests during these times represent lost data that data providers are not getting in relation to their data resources/institutions/collections. For example, the logger service has been unresponsive for over 4 hours out of the last 48 hours according to the monitoring of the /service/logger/reasons path uptime.

An improvement on the current configuration where 100 database connections are used to concurrently attempt to insert log events/log details, could be to create a much larger (or unlimited size) BlockingQueue to temporarily store log requests internally and submit them one by one using a single dedicated Consumer from the queue. This would reduce the lock contention on the database by reducing the number of connections it has open concurrently, and any failures could be retried an arbitrary number of times before logging the full details of the failed request to a file which could be manually examined later.

CC @nickdos