Closed jordanpadams closed 10 months ago
Following on from outcome of #124
@jordanpadams was the increase in disk allocation this morning (approximately) the result of auto-tuning, or a manual action?
@alexdunnjpl investigate sweeper to see if it is causing instability.
@sjoshi-jpl configured slow logs to watch longer requests to OpenSearch.
The document size on ATM and GEO are bigger than usual, which make the fix size chunks (e.g 10000).
The solution is to reduce the size of the pages for repairkit.
ATM works well with smaller pages.
But there is a remaining issue on GEO maybe related to the disk usage. The issue happens when we write the version of repairkit which ran on the documents. Likely the re-indexation of these
ATM is now available and stable.
GEO only had one time out.
PSA still has issues.
Suggested approach - use harvest unit-tests to mock non-200 responses to test retry policy once implemented
@alexdunnjpl will implement a retry behavior in harvest to solve this ticket.
Size of the instances have been upgraded by the SAs. Tests are still needed.
Status: @alexdunnjpl working through investigating these issues
Status: @sjoshi-jpl ticket for SA to increase EBS volumes to be 200GB each for nodes that are having issues.
Status: Implementation ongoing
Status: rudimentary fix applied to registry-common, select users have been instructed to retry their previously-failing jobs with the new harvest snapshot, per @jordanpadams
Awaiting feedback before continuing
Resolving per https://github.com/NASA-PDS/registry-common/pull/42 and https://github.com/NASA-PDS/registry-common/pull/43. Will re-open if issue is identified again
💡 Description
See resolution of #124
Examples of two different timeouts:
[ERROR] 10,000 milliseconds timeout on connection
[ERROR] Read timed out
From GEO:
From ATM: harvest-3.7.6 with -O option to overwrite previous version. First run:
[INFO] Wrote 43134 product(s) [SUMMARY] Summary: [SUMMARY] Skipped files: 0 [SUMMARY] Loaded files: 43134 [SUMMARY] Product_Browse: 11514 [SUMMARY] Product_Bundle: 1 [SUMMARY] Product_Collection: 9 [SUMMARY] Product_Document: 8 [SUMMARY] Product_Observational: 31602 [SUMMARY] Failed files: 10 [SUMMARY] Package ID: d32d4b8d-b306-4a03-b2e8-b5b203c7a30e