Open plawton-umd opened 9 months ago
This is a shot in the dark, but I don't want to overlook the potential of it being relevant - if the harvest is experiencing any errors due to timeouts, it's possible for them to be listed as failures (because the client never received confirmation that the insertions succeeded) but for them to be ingested nonetheless (because the server did get those insertions and processed them, but was overloaded at the time and took too long to handle them).
@plawton-umd if you have any firm sense of whether this is plausible, let me know
@alexdunnjpl No idea. Sometimes in the logs it looks like it
Checked for duplicates
Yes - I've already checked
🐛 Describe the bug
When I did compared information from the harvest.log to the OpenSearch (OS) query results, I noticed differences.
🕵️ Expected behavior
I expected the "count" after the load to equal the "count" before the load plus the harvest.log's number of "Loaded Files".
The harvest.log summary says 150 fewer files were loaded than the OS "count" ( curl -u $REGUSER $OPENSEARCH_URL'/registry/_count?pretty=true' ) says.
📜 To Reproduce
🖥 Environment Info
🩺 Test Data / Additional context
See above
🦄 Related requirements
Tightly coupled with
148
149
⚙️ Engineering Details
N/A