bcgov / nr-fom

Forestry Operations Map
Apache License 2.0
0 stars 1 forks source link

Production: BCGW Replication Starting To Fail #624

Closed ianliuwk1019 closed 4 months ago

ianliuwk1019 commented 5 months ago

Describe the Bug BCGW reported their midnight replication jobs continue to fail and also fail on manual re-run as well. They have provided summary report for these few days as follow: image

When verified on "May 02" for api log on Kibana, log entries indicates all morning midnight 3 runs were all finished. Something else might have failed to return BCGW the query result.

Steps To Reproduce Steps to reproduce the behaviour:

  1. ... Screenshots If applicable, add screenshots to help explain your problem.

Additional context API endpoint: @Get('/bcgw-extract')

MCatherine1994 commented 5 months ago
MCatherine1994 commented 5 months ago

From FOM PROD API memory matrix, we did see at some peak time, the memory usage is way more than we requested now (currently is 140M)

Image

Image

Image

Will increate the requested memory to 600M to guarantee the usage. Because in the case the usage exceeds our requested value, even though it's not exceeding the limit value, but if we can't borrow more resource from other container, it will fail. Image

Out of memory error from Kibana:

Image

Image

The running time for the BCGW extract is up to 80s for now, still within our limit (120s), but because of the run out of memory error, the app might be restarted, that's why it didn't return the result back

Image

Image

Image

Reference: How to setup Kibana to check logs